Retrieval-Augmented Debating 2025

Synopsis

  • Sub-Task 1: Generate responses to argue against a simulated debate partner.
  • Sub-Task 2: Evaluate systems of sub-task 1.
  • Communication: [mailing lists: participants, organizers]
Register for participation Join the Touché mailing list

Important Dates

Subscribe to the Touché mailing list to receive notifications.

  • Nov. 2024: CLEF Registration opened [register]
  • April-May 2025: Approaches submission deadline.
  • May 2025: Participant paper submission.
  • June 2025: Peer review notification.
  • July 2025: Camera-ready participant papers submission.
  • Sep. 2025: CLEF Conference in Madrid and Touché Workshop.

All deadlines are 23:59 CEST (UTC+2).

Task

This task serves to develop generative retrieval systems that argue against their users to support users in forming or confirming opinions or to train their debating skills. Participating systems are debated by simulated users in multiple turns (following the procedure shown below) and evaluated based on their responses.

U1: Claim statement
S1: Supposed to attack U1
U2: Attacks S1
S2: Supposed to respond to U2
U3: Attacks S1 or S2
S3: Supposed to respond to U3
U4: Attacks S1 or S2 or S3
S4: Supposed to respond to U4
Debate procedure for sub-task 1. The simulated user always starts by stating a claim and later attacks the system's responses. The system is expected to respond, either by counterattacking or defending.

Sub-Task 1: Participants submit systems that respond with an utterance Si, to a (simulated) user's utterance Ui by (1) retrieving either counterarguments (to Ui) or supporting evidence (for the attacked S) from a provided argument collection and (2) generating a response (of at most 60 words) from the retrieved data.

Sub-Task 2: Participants submit systems that assess the participating systems of sub-task 1 in terms of one or more of these criteria: (1) relevance of the retrieved counterarguments or evidence; (2) faithfulness of the generated response to the retrieved data; (3) coherence of system responses for the conversation; and (4) argumentative quality of the response.

We will release more information on data, evaluation procedure, and baselines in the next months. Join the Touché mailing list to stay up-to-date.

Task Committee