Ideology and Power Identification in Parliamentary Debates 2024
Synopsis
This task consists of two subtasks on identifying two important aspects of a speaker in parliamentary debates:- Sub-Task 1: Given a parliamentary speech in one of several languages, identify the ideology of the speaker's party.
- Sub-Task 2: Given a parliamentary speech in one of several languages, identify whether the speaker's party is currently governing or in opposition.
- Communication: [mailing lists: task, organizers]
- Training data: [download]
- Test data: [download]
- Submission: [baseline] [evaluator] [forum] [submit]
Important Dates
Subscribe to the mailing list to receive notifications.
- Dec. 18, 2023: CLEF Registration opens. [register]
- May 20, 2024: EXTENDED approaches submission deadline.
- May 31, 2024: Participant paper submission.
- June 24, 2024: Peer review notification.
- July 8, 2024: Camera-ready participant papers submission.
- Sep. 9-12, 2024: CLEF Conference in Grenoble and Touché Workshop.
All deadlines are 23:59 CEST (UTC+2).
Task
Debates in national parliaments do not only affect the fundamental aspects of citizens' life, but often a broader area, or even the whole world. As a form of political debate, however, parliamentary speeches are often indirect and present a number of challenges to computational analyses. In this task, we focus on identifying two variables associated with speakers in a parliamentary debate: their political ideology and whether they belong to a governing party or a party in opposition. Both subtasks are formulated as binary classification tasks.Data
The data for this task comes from ParlaMint, a multilingual comparable corpora of parliamentary debates. The data is sampled from the ParlaMint in a way to reduce the potential confounding variables (e.g., speaker identity). Please join the task mailing list to stay up-to-date and report problems. The data is provided as tab-separated text files. The following shows a toy example:id speaker sex text text_en label gb01 spk1 F First text. First text in English. 0 gb02 spk2 M Second text. Second text in English. 1 gb03 spk3 M Third text. Third text in English. 0 gb04 spk4 F Fourth text. Fourth text in English. 1 gb06 spk5 M Fifth text. Fifth text in English. 0
- id is a unique (arbitrary) ID for each text.
- speaker is a unique (arbitrary) ID for each speaker. There may be multiple speeches from the same speaker.
- sex is the (binary/biological) sex of the speaker. The values in this field can be Female, Memale, and Unspecified/Unknown.
- text is the transcribed text of the parliamentary speech. Real examples may include line breaks, and other special sequences escaped or quoted.
- text_en is the automatic translation of the text to English. This field may be empty - obviously for speeches in English, but the translation may also be missing for a small number of non-English speeches.
- label is the binary/numeric label. For political orientation, 0 is left and 1 is right. For power identification 1 indicates opposition and 0 indicates coalition (or governing party).
- Austria (at)
- Bosnia and Herzegovina (ba)
- Belgium (be)
- Bulgaria (bg)
- Czechia (cz)
- Denmark (dk)
- Estonia (ee)
- Spain (es)
- Catalonia (es-ct)
- Galicia (es-ga)
- Basque Country (es-pv) [only power]
- Finland (fi)
- France (fr)
- Great Britain (gb)
- Greece (gr)
- Croatia (hr)
- Hungary (hu)
- Iceland (is) [only political orientation]
- Italy (it)
- Latvia (lv)
- The Netherlands (nl)
- Norway (no) [only political orientation]
- Poland (pl)
- Portugal (pt)
- Serbia (rs)
- Sweden (se) [only political orientation]
- Slovenia (si)
- Turkey (tr)
- Ukraine (ua)
Evaluation
Both subtasks will use macro-averaged F1-score as the main metric of evaluation. The submission system will evaluate runs automatically using F1-score, Precision and Recall.Submission
The participants are welcome to participate in any of the task - parliament combinations. We do not provide a special track for multilingual models, but participants are encouraged to make use of cross-lingual approaches for improving their predictions. The participants are allowed to use any external datasets, except the source data from ParlaMint. Submission are now accepted through TIRA. You need to register your team on TIRA (in addition to a registration at CLEF) and pick an alias for your team name (submission is anonymous; you can reveal you true team name after final paper acceptance). You can submit both predictoins and dockerized software submisions (for better reproducibility). Please note that you will not be able to see your results on the test dataset until after the deadline. However, your will be informed if your submission failed because of a formatting or software error. We provide a simple linear baseline and evaluation script, which also include a toy example, and examples of how to dockerize your submission.The predictions should be formatted as simple TSV files with two
columns, id, and the prediction. The prediction files should not
contain a header row. The names of the files should be formatted as
team-task-pcode-runname.tsv
.
The task
is either 'orientation', or 'power', and the
pcode
is the lowercase code of the parliament (e.g., at, or
es-ct). The team
and runname
can be helpful
for identifying the team and the approach or the run information, but
they are not significant for the evaluation script. You can set them
to arbitrary strings, but they cannot be empty, and they should not
contain a dash (-
). All files should be placed in the
same directory/folder without further hierarchy. If you are
participating in only a subset of the tasks and/or parliaments, then
you can submit files only for the combinations that you participate in.