This directory contains the result of Vanilla alignment for every language pair.

The disk space needed to store the 190 possible corpora was too high. But the program that generates the corpus is included:

  getAlignmentFromXml.pl

Which allows you to export you own language pair (like the English French given as example in AlignmentCorpora)



Example of usage for Italian-English corpus:

    Before launching it make sure you have uncompressed (using gunzip
    command for example) the alignment-links-xml file.

      GET http://wt.jrc.it/lt/Acquis/JRC-Acquis.2.2/alignments/jrc-en-it.xml.gz
      gunzip jrc-en-it.xml.gz

    Then, you need to get and unpack the two corpora:

      GET http://wt.jrc.it/lt/Acquis/JRC-Acquis.2.2/corpus/jrc-en.tgz
      GET http://wt.jrc.it/lt/Acquis/JRC-Acquis.2.2/corpus/jrc-it.tgz

      tar xzf jrc-en.tgz
      tar xzf jrc-it.tgz

    Then you can launch this program using a perl5 interpreter:

      perl getAlignmentFromXml.pl jrc-en-it.xml > alignedCorpus_en_it.xml