You find the main page for the DCEP resource at:

https://ec.europa.eu/jrc/en/language-technologies>
You will also find there further useful linguistic resources.

Download DCEP: Digital Corpus of the European Parliament

We are offering two downloading options, we strongly recommend to use the first one.

Option 1: DCEP-2013 - Sentence-aligned bilingual corpora (recommended)

For any language pair, you just have to follow these instructions. The example will refer to the resources below.

Documents with sentence segmentation

BG CS DA DE EL EN ES ET FI FR GA HU IT LT LV MT NL PL PT RO SK SL SV TR

Sentence alignment information

BG CS DA DE EL EN ES ET FI FR GA HU IT LT LV MT NL PL PT RO SK SL SV TR
BG BG-CSBG-DABG-DEBG-ELBG-ENBG-ESBG-ETBG-FIBG-FRBG-GABG-HUBG-ITBG-LTBG-LVBG-MTBG-NLBG-PLBG-PTBG-ROBG-SKBG-SLBG-SVBG-TR
CS CS-DACS-DECS-ELCS-ENCS-ESCS-ETCS-FICS-FRCS-GACS-HUCS-ITCS-LTCS-LVCS-MTCS-NLCS-PLCS-PTCS-ROCS-SKCS-SLCS-SVCS-TR
DA DA-DEDA-ELDA-ENDA-ESDA-ETDA-FIDA-FRDA-GADA-HUDA-ITDA-LTDA-LVDA-MTDA-NLDA-PLDA-PTDA-RODA-SKDA-SLDA-SVDA-TR
DE DE-ELDE-ENDE-ESDE-ETDE-FIDE-FRDE-GADE-HUDE-ITDE-LTDE-LVDE-MTDE-NLDE-PLDE-PTDE-RODE-SKDE-SLDE-SVDE-TR
EL EL-ENEL-ESEL-ETEL-FIEL-FREL-GAEL-HUEL-ITEL-LTEL-LVEL-MTEL-NLEL-PLEL-PTEL-ROEL-SKEL-SLEL-SVEL-TR
EN EN-ESEN-ETEN-FIEN-FREN-GAEN-HUEN-ITEN-LTEN-LVEN-MTEN-NLEN-PLEN-PTEN-ROEN-SKEN-SLEN-SVEN-TR
ES ES-ETES-FIES-FRES-GAES-HUES-ITES-LTES-LVES-MTES-NLES-PLES-PTES-ROES-SKES-SLES-SVES-TR
ET ET-FIET-FRET-GAET-HUET-ITET-LTET-LVET-MTET-NLET-PLET-PTET-ROET-SKET-SLET-SVET-TR
FI FI-FRFI-GAFI-HUFI-ITFI-LTFI-LVFI-MTFI-NLFI-PLFI-PTFI-ROFI-SKFI-SLFI-SVFI-TR
FR FR-GAFR-HUFR-ITFR-LTFR-LVFR-MTFR-NLFR-PLFR-PTFR-ROFR-SKFR-SLFR-SVFR-TR
GA GA-HUGA-ITGA-LTGA-LVGA-MTGA-NLGA-PLGA-PTGA-ROGA-SKGA-SLGA-SVGA-TR
HU HU-ITHU-LTHU-LVHU-MTHU-NLHU-PLHU-PTHU-ROHU-SKHU-SLHU-SVHU-TR
IT IT-LTIT-LVIT-MTIT-NLIT-PLIT-PTIT-ROIT-SKIT-SLIT-SVIT-TR
LT LT-LVLT-MTLT-NLLT-PLLT-PTLT-ROLT-SKLT-SLLT-SVLT-TR
LV LV-MTLV-NLLV-PLLV-PTLV-ROLV-SKLV-SLLV-SVLV-TR
MT MT-NLMT-PLMT-PTMT-ROMT-SKMT-SLMT-SVMT-TR
NL NL-PLNL-PTNL-RONL-SKNL-SLNL-SVNL-TR
PL PL-PTPL-ROPL-SKPL-SLPL-SVPL-TR
PT PT-ROPT-SKPT-SLPT-SVPT-TR
RO RO-SKRO-SLRO-SVRO-TR
SK SK-SLSK-SVSK-TR
SL SL-SVSL-TR
SV SV-TR
TR

Tools to extract parallel bilingual sentences

DCEP-extract-scripts.tar.bz2

Option 2: DCEP-2013 - unaligned version with some tools (unsupported)

Links to the original XML/SGML documents (source), documents with markup removed (strip), document alignment information (index), and some processing tools are below. The examples show the basic usage of the scripts.

Source XML/SGML documents (source)

BG CS DA DE EL EN ES ET FI FR GA HU IT LT LV MT NL PL PT RO SK SL SV TR

Documents with markup removed (strip)

BG CS DA DE EL EN ES ET FI FR GA HU IT LT LV MT NL PL PT RO SK SL SV TR

Document alignment information (index)

cross-lingual-index.txt.bz2

Markup removal

DCEP-strip-scripts.tar.bz2

Sentence segmentation

DCEP-sentence-scripts.tar.bz2

Text extraction from parallel documents

DCEP-index-scripts.tar.bz2

Document date: 11 March 2015