JRC logo

JRC-ACQUIS Multilingual Parallel Corpus, Version 2.2

This page is obsolete.

Please go to JRC's Language Technology resource page:

https://ec.europa.eu/jrc/en/language-technologies>
You will find many more useful linguistic resources there.

Version 3.0 is now available. Go to http://langtech.jrc.it/JRC-Acquis.html to get this latest and extended version.

This is the download page of Version 2.2 of the aligned multilingual corpus JRC-ACQUIS . The dataset contains resources for the following languages: Czech, Danish, German, Greek, English, Spanish, Estonian, Finnish, French, Hungarian, Italian, Lithuanian, Latvian, Maltese, Dutch, Polish, Portuguese, Romanian, Slovak, Slovene, Swedish.

News: Version 2.2 contains alignment data. The ACQUIS corpus has been reduced to those texts that are really in the original language (For more information, read the "news" page)

Information about the corpus and the alignment

  1. Documentation

Download:

  1. AC Corpus (by language)
  2. AC Aligned Corpus using Vanilla aligner
  3. AC Aligned Corpus using HunAlign

By downloading these resources, you agree to the usage conditions.

This multilingual parallel corpus has been compiled by the Language Technology team of the European Commission's Joint Research Centre (JRC) in the context of the workshop Exploiting parallel corpora in up to 20 languages, held in Arona, Italy, on 26 and 27 September 2005.

LangTech logo


Page last updated 2006-05-15, LT Group - JRC

Valid HTML 4.01 Transitional

Site Meter