You find the main page for the DCEP resource at:
https://ec.europa.eu/jrc/en/language-technologies>
You will also find there further useful linguistic resources.
Download DCEP: Digital Corpus of the European Parliament
We are offering two downloading options, we strongly recommend to use
the first one.
Option 1: DCEP-2013 - Sentence-aligned bilingual corpora (recommended)
For any language pair, you just have to
follow these instructions. The
example will refer to the resources below.
Documents with sentence segmentation
BG
CS
DA
DE
EL
EN
ES
ET
FI
FR
GA
HU
IT
LT
LV
MT
NL
PL
PT
RO
SK
SL
SV
TR
Sentence alignment information
Tools to extract parallel bilingual sentences
DCEP-extract-scripts.tar.bz2
Option 2: DCEP-2013 - unaligned version with some tools (unsupported)
Links to the original XML/SGML documents (source), documents with
markup removed (strip), document alignment information (index), and
some processing tools are
below. The examples show the
basic usage of the scripts.
Source XML/SGML documents (source)
BG
CS
DA
DE
EL
EN
ES
ET
FI
FR
GA
HU
IT
LT
LV
MT
NL
PL
PT
RO
SK
SL
SV
TR
Documents with markup removed (strip)
BG
CS
DA
DE
EL
EN
ES
ET
FI
FR
GA
HU
IT
LT
LV
MT
NL
PL
PT
RO
SK
SL
SV
TR
Document alignment information (index)
cross-lingual-index.txt.bz2
Markup removal
DCEP-strip-scripts.tar.bz2
Sentence segmentation
DCEP-sentence-scripts.tar.bz2
Text extraction from parallel documents
DCEP-index-scripts.tar.bz2
Document date: 11 March 2015