You will also find there further useful linguistic resources.

Download DCEP: Digital Corpus of the European Parliament

We are offering two downloading options, we strongly recommend to use the first one.

Option 1: DCEP-2013 - Sentence-aligned bilingual corpora (recommended)

For any language pair, you just have to follow these instructions. The example will refer to the resources below.

Documents with sentence segmentation

BG CS DA DE EL EN ES ET FI FR GA HU IT LT LV MT NL PL PT RO SK SL SV TR

Sentence alignment information

	CS	DA	DE	EL	EN	ES	ET	FI	FR	GA	HU	IT	LT	LV	MT	NL	PL	PT	RO	SK	SL	SV	TR
BG	BG-CS	BG-DA	BG-DE	BG-EL	BG-EN	BG-ES	BG-ET	BG-FI	BG-FR	BG-GA	BG-HU	BG-IT	BG-LT	BG-LV	BG-MT	BG-NL	BG-PL	BG-PT	BG-RO	BG-SK	BG-SL	BG-SV	BG-TR
CS		CS-DA	CS-DE	CS-EL	CS-EN	CS-ES	CS-ET	CS-FI	CS-FR	CS-GA	CS-HU	CS-IT	CS-LT	CS-LV	CS-MT	CS-NL	CS-PL	CS-PT	CS-RO	CS-SK	CS-SL	CS-SV	CS-TR
DA			DA-DE	DA-EL	DA-EN	DA-ES	DA-ET	DA-FI	DA-FR	DA-GA	DA-HU	DA-IT	DA-LT	DA-LV	DA-MT	DA-NL	DA-PL	DA-PT	DA-RO	DA-SK	DA-SL	DA-SV	DA-TR
DE				DE-EL	DE-EN	DE-ES	DE-ET	DE-FI	DE-FR	DE-GA	DE-HU	DE-IT	DE-LT	DE-LV	DE-MT	DE-NL	DE-PL	DE-PT	DE-RO	DE-SK	DE-SL	DE-SV	DE-TR
EL					EL-EN	EL-ES	EL-ET	EL-FI	EL-FR	EL-GA	EL-HU	EL-IT	EL-LT	EL-LV	EL-MT	EL-NL	EL-PL	EL-PT	EL-RO	EL-SK	EL-SL	EL-SV	EL-TR
EN						EN-ES	EN-ET	EN-FI	EN-FR	EN-GA	EN-HU	EN-IT	EN-LT	EN-LV	EN-MT	EN-NL	EN-PL	EN-PT	EN-RO	EN-SK	EN-SL	EN-SV	EN-TR
ES							ES-ET	ES-FI	ES-FR	ES-GA	ES-HU	ES-IT	ES-LT	ES-LV	ES-MT	ES-NL	ES-PL	ES-PT	ES-RO	ES-SK	ES-SL	ES-SV	ES-TR
ET								ET-FI	ET-FR	ET-GA	ET-HU	ET-IT	ET-LT	ET-LV	ET-MT	ET-NL	ET-PL	ET-PT	ET-RO	ET-SK	ET-SL	ET-SV	ET-TR
FI									FI-FR	FI-GA	FI-HU	FI-IT	FI-LT	FI-LV	FI-MT	FI-NL	FI-PL	FI-PT	FI-RO	FI-SK	FI-SL	FI-SV	FI-TR
FR										FR-GA	FR-HU	FR-IT	FR-LT	FR-LV	FR-MT	FR-NL	FR-PL	FR-PT	FR-RO	FR-SK	FR-SL	FR-SV	FR-TR
GA											GA-HU	GA-IT	GA-LT	GA-LV	GA-MT	GA-NL	GA-PL	GA-PT	GA-RO	GA-SK	GA-SL	GA-SV	GA-TR
HU												HU-IT	HU-LT	HU-LV	HU-MT	HU-NL	HU-PL	HU-PT	HU-RO	HU-SK	HU-SL	HU-SV	HU-TR
IT													IT-LT	IT-LV	IT-MT	IT-NL	IT-PL	IT-PT	IT-RO	IT-SK	IT-SL	IT-SV	IT-TR
LT														LT-LV	LT-MT	LT-NL	LT-PL	LT-PT	LT-RO	LT-SK	LT-SL	LT-SV	LT-TR
LV															LV-MT	LV-NL	LV-PL	LV-PT	LV-RO	LV-SK	LV-SL	LV-SV	LV-TR
MT																MT-NL	MT-PL	MT-PT	MT-RO	MT-SK	MT-SL	MT-SV	MT-TR
NL																	NL-PL	NL-PT	NL-RO	NL-SK	NL-SL	NL-SV	NL-TR
PL																		PL-PT	PL-RO	PL-SK	PL-SL	PL-SV	PL-TR
PT																			PT-RO	PT-SK	PT-SL	PT-SV	PT-TR
RO																				RO-SK	RO-SL	RO-SV	RO-TR
SK																					SK-SL	SK-SV	SK-TR
SL																						SL-SV	SL-TR
SV																							SV-TR
TR

Tools to extract parallel bilingual sentences

DCEP-extract-scripts.tar.bz2

Option 2: DCEP-2013 - unaligned version with some tools (unsupported)

Links to the original XML/SGML documents (source), documents with markup removed (strip), document alignment information (index), and some processing tools are below. The examples show the basic usage of the scripts.

Document date: 11 March 2015

You find the main page for the DCEP resource at: https://ec.europa.eu/jrc/en/language-technologies> You will also find there further useful linguistic resources.

Download DCEP: Digital Corpus of the European Parliament

Option 1: DCEP-2013 - Sentence-aligned bilingual corpora (recommended)

Documents with sentence segmentation

Sentence alignment information

Tools to extract parallel bilingual sentences

Option 2: DCEP-2013 - unaligned version with some tools (unsupported)

Source XML/SGML documents (source)

Documents with markup removed (strip)

Document alignment information (index)

Markup removal

Sentence segmentation

Text extraction from parallel documents

You find the main page for the DCEP resource at:

https://ec.europa.eu/jrc/en/language-technologies>
You will also find there further useful linguistic resources.