Evidence Retrieval for Causal Questions 2023

Synopsis

  • Task: Given a causality-related topic, the task is to retrieve and rank documents by relevance to the topic and detect the document "causal" stance.
  • Input: [topics]
  • Submission: will be added by mid-December.
  • Evaluation: will be added by mid-December.

Task

The goal of Task 2 is to support users who want to understand whether a causal relationship between two events / actions exists. Given a causality-related topic and a collection of web documents, the task is to retrieve and rank documents by relevance to the topic and detect the document "causal" stance (i.e., whether the document supports, refutes, or provides no information about the title's causal statement.).

Register now

Data

Example topic for Task 2 (download topics):

<topic>
<number>1</number>
<title>Can eating broccoli lead to constipation?</title>
<cause>broccoli</cause>
<effect>constipation</effect>
<description>A young parent has a child experiencing constipation after eating some broccoli for dinner and is wondering [...]</description>
<narrative>Relevant documents will discuss if broccoli and other high fiber foods can cause or ease constipation [...]</narrative>
</topic>

The corpus for Task 2 is ClueWeb22. If your organization already has a copy of the corpus, you can start with it; otherwise, we are currently preparing access to the data, which we expect to make available by mid-December.

Evaluation

Our human assessors will label the retrieved documents according to two relevance dimensions: (1) whether the document is on topic, i.e., contains information about the causal relationship of the events in a question; the direction of causality will considered, e.g., a document stating that B causes A will be considered as off-topic for the question "Does A cause B?" and (2) if the document is on topic, whether the contained evidence is circumstantial (e.g., a single observation of the co-occurrence of two events) or general (e.g., a statement gained through inductive reasoning). We will use nDCG@5 to evaluate rankings and accuracy to evaluate stance detection.

Submission

We ask participants to use TIRA for result submissions.

Runs may be either automatic, semi-automatic, or manual. An automatic run must use only the topic title and not "manipulate" these via manual intervention. Semi-automatic runs may additionally use the <cause> and <effect> fields. A manual run is anything that is not an automatic or semi-automatic run. Upon submission, please let us know what the type of your runs is. For each topic, include up to 1,000 retrieved documents. Each team can submit up to 5 different runs.

The submission format for the task will follow the standard TREC format:

qid stance doc rank score tag
With:
  • qid: The topic number.
  • stance: The causal stance of the document (SUP: document provides evidence for causal relation, REF: causal relation provides evidence against causal relation, NEU: neutral stance, i.e., document provides inconclusive evidence or both supporting and refuting evidence, NO: document provides no evidence).
  • doc: The document ID qid.
  • rank: The rank the document is retrieved at.
  • score: The score (integer or floating point) that generated the ranking. The score must be in descending (non-increasing) order. It is important to handle tied scores.
  • tag: A tag that identifies your group and the method you used to produce the run.
If you do not classify the stance, use Q0 as the value in the stance column. The fields should be separated by a whitespace. The individual columns' widths are not restricted (i.e., score can be an arbitrary precision that has no ties) but it is important to include all columns and to separate them with a whitespace.

An example run for Task 2 is:
1 PRO clueweb22-en0010-85-29836___1 1 17.89 myGroupMyMethod
1 CON clueweb22-en0010-86-00457___3 2 16.43 myGroupMyMethod
1 NEU clueweb22-en0010-86-09202___5 3 16.32 myGroupMyMethod
...

Task Committee