This directory contains the result of Vanilla alignment for every language pair. The disk space needed to store the 190 possible corpora was too high. But the program that generates the corpus is included: getAlignmentFromXml.pl Which allows you to export you own language pair (like the English French given as example in AlignmentCorpora) Example of usage for Italian-English corpus: Before launching it make sure you have uncompressed (using gunzip command for example) the alignment-links-xml file. GET http://wt.jrc.it/lt/Acquis/JRC-Acquis.2.2/alignments/jrc-en-it.xml.gz gunzip jrc-en-it.xml.gz Then, you need to get and unpack the two corpora: GET http://wt.jrc.it/lt/Acquis/JRC-Acquis.2.2/corpus/jrc-en.tgz GET http://wt.jrc.it/lt/Acquis/JRC-Acquis.2.2/corpus/jrc-it.tgz tar xzf jrc-en.tgz tar xzf jrc-it.tgz Then you can launch this program using a perl5 interpreter: perl getAlignmentFromXml.pl jrc-en-it.xml > alignedCorpus_en_it.xml