Argument Retrieval for Controversial Questions 2023

Synopsis

  • Task: Given a controversial topic, the task is to retrieve and rank documents by their relevance and argument quality and to detect the document stance.
  • Input: [topics]
  • Submission: will be added by mid-December.
  • Evaluation: will be added by mid-December.
  • Task

    The goal of Task 1 is to to provide an overview of arguments and opinions on controversial topics. Given a controversial topic and a collection of web documents, the task is to retrieve and rank documents by relevance to the topic, by argument quality, and to detect the document stance. Participants of Task 2 will retrieve and rank documents that contain relevant causal evidence contained in the ClueWeb22 crawl for a given set of 50 search topics.

    Register now

    Data

    Example topic for Task 1 (download topics):

    <topic>
    <number>1</number>
    <title>Should teachers get tenure?</title>
    <description>A user has heard that some countries do give teachers tenure and others don't. 
    Interested in the reasoning for or against tenure, the user searches for positive and negative arguments [...]</description>
    <narrative>Highly relevant arguments make a clear statement about tenure for teachers in schools or universities [...]</narrative>
    </topic>

    The corpus for Task 1 is ClueWeb22. If your organization already has a copy of the corpus, you can start with it; otherwise, we are currently preparing access to the data, which we expect to make available by mid-December.

    Additional resources:

    Evaluation

    Our human assessors will label the ranked results both for their general topical relevance and for the rhetorical argument quality [paper], i.e., "well-writtennes": (1) whether the document contains arguments and whether the argument text has a good style of speech, (2) whether the text has a proper sentence structure and is easy to follow, (3) whether it includes profanity, has typos, etc. Optionally, detect the documents' stance: pro, con, neutral, or no stance towards the search topic. We will use nDCG@5 to evaluate rankings and accuracy to evaluate stance detection.

    Submission

    We ask participants to use TIRA for result submissions.

    Runs may be either automatic or manual. An automatic run must not "manipulate" the topic titles via manual intervention. A manual run is anything that is not an automatic run. Upon submission, please let us know which of your runs are manual. For each topic, include up to 1,000 retrieved documents. Each team can submit up to 5 different runs.

    The submission format for the task will follow the standard TREC format:

    qid stance doc rank score tag
    With:
    • qid: The topic number.
    • stance: The stance of the document (PRO: supports the topic, CON: against the topic, NEU: neutral stance, NO: no stance).
    • doc: The document ID qid.
    • rank: The rank the document is retrieved at.
    • score: The score (integer or floating point) that generated the ranking. The score must be in descending (non-increasing) order. It is important to handle tied scores.
    • tag: A tag that identifies your group and the method you used to produce the run.
    If you do not classify the stance, use Q0 as the value in the stance column. The fields should be separated by a whitespace. The individual columns' widths are not restricted (i.e., score can be an arbitrary precision that has no ties) but it is important to include all columns and to separate them with a whitespace.

    An example run for Task 2 is:
    1 PRO clueweb22-en0010-85-29836___1 1 17.89 myGroupMyMethod
    1 CON clueweb22-en0010-86-00457___3 2 16.43 myGroupMyMethod
    1 NEU clueweb22-en0010-86-09202___5 3 16.32 myGroupMyMethod
    ...

    Task Committee