Touché 2020

1st Shared Task on Argument Retrieval

Collocated with CLEF 2020 in Thessaloniki, Greece. Evaluations will commence from January till June. The conference will be in the week from September 22-25, 2020.


Decision making processes, be it at the societal or at the personal level, eventually come to a point where one side will challenge the other with a why-question, which is a prompt to justify one’s stance. Thus, technologies for argument mining and argumentation processing are maturing at a rapid pace, giving rise for the first time to argument retrieval. We invite to participate in the first lab on Argument Retrieval at CLEF 2020 featuring two subtasks:

(1) retrieval in a focused argument collection to support argumentative conversations;
(2) retrieval in a generic web crawl to answer comparative questions with argumentative results.

The (1) subtask is motivated by the support of users who search for arguments directly, e.g., by supporting their stance, and targets argumentative conversations. The task is to retrieve arguments from the provided dataset of the focused crawl with content from online debate portals for the 50 given topics, covering a wide range of controversial issues.

The (2) subtask is motivated by the support of users with arguments in personal decisions from everyday life where it comes to making choices. The task is to retrieve ranked documents from a general web crawl ClueWeb12 that help the users to answer their comparative question. We provide 50 such questions.

Should you have questions contact us via

Register now


November 22, 2019: Training data available, competition begins. 
February 01, 2020: Submission system opens.
April 26, 2020: Submission system closed, manual evaluation. begins.
TBA : Leader board (TIRA).

The timezone of all deadlines is Anywhere on Earth.


Argument topics for subtask (1) and comparative questions for subtask (2) will be send to each team via email upon completed registration. The topics will be provided as XML files.

Example topic for subtask (1):

      <title>Is climate change real?</title>
      <description>You read an opinion piece on how climate change is a hoax and disagree. Now you are looking for arguments supporting the claim that climate change is in fact real.</description>
      <narrative>Relevant arguments will support the given stance that climate change is real or attack a hoax side's argument.</narrative>

Document collections. To search for relevant arguments, you can use your own index based on the dataset args-me or for simplicity deploy an API of the search engine

Example topic for subtask (2):

      <title>What are advantages and disadvantages of PHP over Python and vice versa?</title>
      <description>The user is looking for differences and similarities of PHP and Python and wants to know about scenarios that favor one over the other.</description>
      <narrative>Relevant documents may contain an overview of more than these two programming languages but must include both of them with an explicit comparison of these two.</narrative>

Document collections. To search for relevant documents, you can use your own index based on ClueWeb12 or for simplicity deploy an API of the search engine ChatNoir. You will recieve credentials to access API upon completed registration.


We will add detailed evaluation requirements and submission instructions next week.

Runs Submission

We encourage participants to use TIRA for their submissions to increase replicability of the experiments. We provide a dedicated TIRA tutorial for Touché and are available to walk you through. You can also submit runs per email. In both cases, we will review your submission promptly and provide feedback.

Runs may be either automatic or manual. An automatic run is made without any manual manipulation of the given topic titles. Your run is automatic if you do not use description and narrative for developing approaches. A manual run is anything that is not an automatic run. Please let us know which of your runs are manual upon submission.

The submission format for both tasks will follow the standard TREC format:

qid Q0 doc rank score tag


  • qid: The topic number.
  • Q0: Unused, should always be Q0.
  • doc: The document id returned by your system for the topic qid:
    • For subtask (1): Use the official args-me id.
    • For subtask (2): Use the official ClueWeb12 id.
  • rank: The rank the document is retrieved at.
  • score: The score (integer or floating point) that generated the ranking. The score must be in descending (non-increasing) order. It is important to handle tied scores. (trec_eval sorts documents by the score values and not your rank values.)
  • tag: A tag that identifies your group and the method you used to produce the run.
The fields should be spectated with a whitespace. The width of the columns in the format is not important, but it is important to include all columns and have some amount of white space between the columns.

An example run for task 1 is:
1 Q0 10113b57-2019-04-18T17:05:08Z-00001-000 1 17.89 myGroupMyMethod
1 Q0 100531be-2019-04-18T19:18:31Z-00000-000 2 16.43 myGroupMyMethod
1 Q0 10006689-2019-04-18T18:27:51Z-00000-000 3 16.42 myGroupMyMethod

An example run for task 2 is:
1 Q0 clueweb09-en0010-85-29836 1 17.89 myGroupMyMethod
1 Q0 clueweb09-en0010-86-00457 2 16.43 myGroupMyMethod
1 Q0 clueweb09-en0010-86-09202 3 16.42 myGroupMyMethod


The workshop program will be announced closer to the conference.

Organizing Committee