Ideology and Power Identification in Parliamentary Debates 2024

Synopsis

This task consists of two subtasks on identifying two important aspects of a speaker in parliamentary debates:
  • Sub-Task 1: Given a parliamentary speech in one of several languages, identify the ideology of the speaker's party.
  • Sub-Task 2: Given a parliamentary speech in one of several languages, identify whether the speaker's party is currently governing or in opposition.
  • Communication: [mailing lists: task, organizers]
  • Training data: [download]
  • Test data: [download]
  • Submission: [baseline] [evaluator] [forum] [submit]
Join the mailing list

Important Dates

Subscribe to the mailing list to receive notifications.

  • Dec. 18, 2023: CLEF Registration opens. [register]
  • May 20, 2024: EXTENDED approaches submission deadline.
  • May 31, 2024: Participant paper submission.
  • June 24, 2024: Peer review notification.
  • July 8, 2024: Camera-ready participant papers submission.
  • Sep. 9-12, 2024: CLEF Conference in Grenoble and Touché Workshop.

All deadlines are 23:59 CEST (UTC+2).

Task

Debates in national parliaments do not only affect the fundamental aspects of citizens' life, but often a broader area, or even the whole world. As a form of political debate, however, parliamentary speeches are often indirect and present a number of challenges to computational analyses. In this task, we focus on identifying two variables associated with speakers in a parliamentary debate: their political ideology and whether they belong to a governing party or a party in opposition. Both subtasks are formulated as binary classification tasks.

Data

The data for this task comes from ParlaMint, a multilingual comparable corpora of parliamentary debates. The data is sampled from the ParlaMint in a way to reduce the potential confounding variables (e.g., speaker identity). Please join the task mailing list to stay up-to-date and report problems. The data is provided as tab-separated text files. The following shows a toy example:
id speaker sex text text_en label 
gb01 spk1   F   First text. First text in English. 0
gb02 spk2   M   Second text. Second text in English. 1
gb03 spk3   M   Third text. Third text in English. 0
gb04 spk4   F   Fourth text. Fourth text in English. 1
gb06 spk5   M   Fifth text. Fifth text in English. 0
  • id is a unique (arbitrary) ID for each text.
  • speaker is a unique (arbitrary) ID for each speaker. There may be multiple speeches from the same speaker.
  • sex is the (binary/biological) sex of the speaker. The values in this field can be Female, Memale, and Unspecified/Unknown.
  • text is the transcribed text of the parliamentary speech. Real examples may include line breaks, and other special sequences escaped or quoted.
  • text_en is the automatic translation of the text to English. This field may be empty - obviously for speeches in English, but the translation may also be missing for a small number of non-English speeches.
  • label is the binary/numeric label. For political orientation, 0 is left and 1 is right. For power identification 1 indicates opposition and 0 indicates coalition (or governing party).
Participants are not required to use the first four fields, but may want to use them for improving the predictions (e.g., in a joint/multi-task learning model, or to explain away the effect of speaker style for better generalization). Similarly the field text_en is provided for convenience. It may help building quick multilingual classifiers, or help understanding and analyzing data in languages participants do not speak. The test files will also include exact same fields except label. A small trial sample is provided for both political orientation and power identification. We provide training data for the following national or regional parliaments:
  • Austria (at)
  • Bosnia and Herzegovina (ba)
  • Belgium (be)
  • Bulgaria (bg)
  • Czechia (cz)
  • Denmark (dk)
  • Estonia (ee)
  • Spain (es)
  • Catalonia (es-ct)
  • Galicia (es-ga)
  • Basque Country (es-pv) [only power]
  • Finland (fi)
  • France (fr)
  • Great Britain (gb)
  • Greece (gr)
  • Croatia (hr)
  • Hungary (hu)
  • Iceland (is) [only political orientation]
  • Italy (it)
  • Latvia (lv)
  • The Netherlands (nl)
  • Norway (no) [only political orientation]
  • Poland (pl)
  • Portugal (pt)
  • Serbia (rs)
  • Sweden (se) [only political orientation]
  • Slovenia (si)
  • Turkey (tr)
  • Ukraine (ua)

Evaluation

Both subtasks will use macro-averaged F1-score as the main metric of evaluation. The submission system will evaluate runs automatically using F1-score, Precision and Recall.

Submission

The participants are welcome to participate in any of the task - parliament combinations. We do not provide a special track for multilingual models, but participants are encouraged to make use of cross-lingual approaches for improving their predictions. The participants are allowed to use any external datasets, except the source data from ParlaMint. Submission are now accepted through TIRA. You need to register your team on TIRA (in addition to a registration at CLEF) and pick an alias for your team name (submission is anonymous; you can reveal you true team name after final paper acceptance). You can submit both predictoins and dockerized software submisions (for better reproducibility). Please note that you will not be able to see your results on the test dataset until after the deadline. However, your will be informed if your submission failed because of a formatting or software error. We provide a simple linear baseline and evaluation script, which also include a toy example, and examples of how to dockerize your submission.

The predictions should be formatted as simple TSV files with two columns, id, and the prediction. The prediction files should not contain a header row. The names of the files should be formatted as team-task-pcode-runname.tsv. The task is either 'orientation', or 'power', and the pcode is the lowercase code of the parliament (e.g., at, or es-ct). The team and runname can be helpful for identifying the team and the approach or the run information, but they are not significant for the evaluation script. You can set them to arbitrary strings, but they cannot be empty, and they should not contain a dash (-). All files should be placed in the same directory/folder without further hierarchy. If you are participating in only a subset of the tasks and/or parliaments, then you can submit files only for the combinations that you participate in.

Task Committee