Generalizability of Argument Identification in Context 2026

Synopsis

  • Task: Decide if a sentence, in its context, constitutes an argument or not.
  • Communication: [mailing lists: participants, organizers]
Join the Touché mailing list

Important Dates

See the CLEF 2026 homepage.

Subscribe to the Touché mailing list to receive notifications.

Task

Given a sentence from a dataset along with metadata about its provenance, such as the source text and the dataset's annotation guidelines, predict whether the sentence can be annotated as an argument or not. In particular, participants are encouraged to develop robust systems that generalize beyond lexical shortcuts to unseen datasets and investigate ways to exploit rich context information for this purpose.

Data

For the task, training data will be provided as a subset of the 17 benchmark datasets totaling about 345k labeled sentences and identified as most relevant for argument identification in this paper. This subset includes sentences each labeled as argument or no-argument, according to the respective dataset annotations, along with accompanying metadata such as IDs, generated training and development splits, links to original data sources and annotation guidelines, as well as the scripts used for data preparation.

Evaluation

The systems will be evaluated on test data that differs from the development data. This includes partially or fully held-out portions of the datasets used for sampling, as well as newly created data reflecting diverse domains and annotation guidelines. This setup addresses the risk of data contamination in LLMs and for participants’ potential use of additional datasets during training. Generalizability will be measured using the macro F$_1$-score. To evaluate the systems, the macro F$_1$-score will be specified for each test dataset, along with the overall average of all these values.

Submission

We ask participants to use TIRA for submissions. Each team can submit up to 3 approaches to the task.

Related Work

Task Committee