TEI Header

type:corpus
date:2006-05-14 (created)

File Description

Title Statement:
Title:
Alignments for the JRC-Acquis Corpus
Responsibility Statement:
Dániel Varga
  • Alignment
  • Responsibility Statement:
    Bruno Pouliquen (Joint Research Centre, Italy)
    Camelia Ignat (Joint Research Centre, Italy)
    Anna Widiger (Joint Research Centre, Italy)
    Tomaž Erjavec (Jožef Stefan Institute, Slovenia)
    Ralf Steinberger (Joint Research Centre, Italy)
    Compilation of the corpus.
    Principal researcher:
    Ralf Steinberger
    Address:
    European Commission
    Joint Research Centre - IPSC
    Ralf Steinberger
    T.P. 267
    Via E. Fermi 1
    21020 Ispra (VA)
    Italy
    Fax: (+39) 0332 78 5154
    EMail: Ralf.Steinberger@jrc.it
    URL: http://www.jrc.it/langtech/
    Edition Statement:
    Edition:
    Version 2.2
    Extent:
    Publication Statement:
    Place of publication:
    http://www.jrc.it/langtech/
    Availiability (free) :

    The alignment links are freely distributed, but please acknowledge their use in any publications. For conditions of use on the primary data see the JRC-Acquis corpus headers.

    Distributor:
    Address:
    Language Technology Group
    Joint Research Centre - IPSC
    European Commission
    T.P. 267
    Via E. Fermi 1
    21020 Ispra (VA)
    Italy
    Fax: (+39) 0332 78 5154
    URL: http://www.jrc.it/langtech
    Source Description:

    Created in electronic format.

    Encoding Description

    Sampling declaration:

    From the available ACQUIS documents only those were selected for aligning that have at least 10 translations, from which at least 3 are not languages from the 'old' EU, i.e. Czech, Estonian, Hungarian, Lithuanian, Latvian, Maltese, Polish, Romanian, Slovak and Slovene.

    Editorial declaration:

    Information on the corpus compilation can be found in:
    Ralf Steinberger , Bruno Pouliquen , Anna Widiger , Camelia Ignat , Tomaž Erjavec , Dan Tufiş , and Dániel Varga The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages In: Proceedings of the Fifth Intl. Conf. on Language Resources and Evaluation, LREC'06, Genoa, Italy, 2006. Available at http://www.jrc.it/langtech/index.html#Publications.

    Tags declaration:
    body =
    All alignments for a language pair.
    div =
    Alignment of one document.
    head =
    Title of document, one head for each language.
    p =
    A short description of the alignment.
    linkGrp =
    Link group for alignment of one document. Attributes are @targType = aligned target elements; @select = aligned languages; @id = corpus-wide unique ID; @xtargets = references to IDs of aligned documents, with ';' as the delimiter.
    link =
    Alignment link; points to aligned elements. Attributes are @type = link arity ([0-2]-[0-2]); @xtargets = references to aligned paragraphs. The ';' delimiter separates the aligned documents (parent::linkGrp/@xtargets) and space the @n values of the paragraph (parent::linkGrp/@targType).

    Profile Description

    Language use:
    cs = Czech
    da = Danish
    de = German
    el = Greek
    en = English
    es = Spanish
    et = Estonian
    fi = Finnish
    fr = French
    hu = Hungarian
    it = Italian
    lt = Lithuanian
    lv = Latvian
    nl = Dutch
    mt = Maltese
    no = Norwegian
    pl = Polish
    pt = Portuguese
    ro = Romanian
    sk = Slovak
    sl = Slovene
    sv = Swedish

    Revision Description