Causality Extraction 2026

Synopsis
Important Dates
Task
Data
Submission
Evaluation
Task Committee

Synopsis

Sub-Task 1: Given a natural language text, classify whether it contains causal information or not.
Sub-Task 2: Given a natural language text, identify text spans that are good candidates to express events or concepts that are stated to partake in a causal relationship.
Sub-Task 3: Given a natural language text and a candidate pair of events or concepts, E₁ and E₂, classify the type of causal relationship expressed between E₁ and E₂.
Communication: [mailing lists: participants, organizers]

Important Dates

See the CLEF 2026 homepage.

Subscribe to the Touché mailing list to receive notifications.

Task

Given a natural language sentence, the extraction of causal statements, causality extraction, can be split into three steps; each of which is its own sub-task:

Classify for the entire sentence if contains causal information or not.
Identify text-spans that are good candidates to partake in a claimed or refuted causal relationship.
Given a pair of candidates, classify whether the sentence supports (procausal) or refutes (concausal) a causal relationship, or makes no statement how the pair relates (uncausal).

Participants can choose to participate in one, two, or all three Sub-Tasks.

Data

The dataset for all three of the Sub-Tasks will be published at a later date. For now, a dummy dataset is provided that uses the same format as the dataset. Participants can use the published splits to train and validate their submission. For evaluation, we will use a separate, unpublished test split with the same format.

Data format (click to see)

The dataset follows the Dataset Card Specification which configures the file structure and tasks for each subtask and split. To open the dataset, it should be downloaded and unpacked from Zenodo. Then, HF Datasets can be used to open the dataset. The following Python code opens the dataset and prints the first entry in the training set for each subtask (causality detection, causal candidate extraction, and causality identification) to the console.

from pathlib import Path
from datasets import load_dataset

path_to_dataset = Path(".")  # Replace me with the actual path!

dataset = load_dataset(str(path_to_dataset.absolute()), "causality detection")
print("Causality Detection")
print("Splits:", list(dataset.keys()))
print("Example:", dataset["train"][0])
print()

dataset = load_dataset(str(path_to_dataset.absolute()), "causal candidate extraction")
print("Causal Candidate Extraction")
print("Splits:", list(dataset.keys()))
print("Example:", dataset["train"][0])
print()

dataset = load_dataset(str(path_to_dataset.absolute()), "causality identification")
print("Causality Identification")
print("Splits:", list(dataset.keys()))
print("Example:", dataset["train"][0])
print()

Submission

We ask participants to use TIRA for result submissions. Each team can submit up to one approach per Sub-Task.

Submission for Sub-Task 1

The submissions for Sub-Task 1 need to be made as a code submission.

The output of the code submission needs to be a JSONL file. Each line in the JSONL file should be in the following JSON format:

id: The ID of the text that was classified.
label: The label assigned by your classifier. 1 if the response contains (pro-/con-)causal information and 0 otherwise.
tag: A tag that identifies your group and the method you used to produce the run.

Example submission file (click to see)

{
    'id': 'cnc_train_01_0_234_0', 
    'label': 1, 
    'tag': 'myGroupMyMethod'
}

Submission for Sub-Task 2

The submissions for Sub-Task 2 need to be made as a code submission.

The output of the code submission needs to be a JSONL file. Each line in the JSONL file should be in the following JSON format:

id: The ID of the text that was classified.
spans: The list of spans your submission predicted to partake in a causal relationship according to the input text. A span is a pair of two positive integers that give the start and end index in characrers.
tag: A tag that identifies your group and the method you used to produce the run.

Example submission file (click to see)

{
    'id': 'cnc_train_01_0_234_0', 
    'label': [[0, 10], [20, 25]], 
    'tag': 'myGroupMyMethod'
}

Submission for Sub-Task 3

The submissions for Sub-Task 3 need to be made as a code submission.

The output of the code submission needs to be a JSONL file. Each line in the JSONL file should be in the following JSON format:

id: The ID of the text that was classified.
label: The label assigned by your classifier. 0 if the marked spans (ARG0 and ARG1) are uncausal (nothing can be said about how ARG0 causally influences ARG1), 1 if they are pro-causal (ARG0 causes ARG1) and 2 if they are con-causal (ARG0 does not cause ARG1).
tag: A tag that identifies your group and the method you used to produce the run.

Example submission file (click to see)

{
    'id': 'cnc_train_01_0_234_0_1', 
    'label': 1,
    'tag': 'myGroupMyMethod'
}

Evaluation

Evaluation for Sub-Task 1

Sub-Task 1 is evaluated as a binary classification task using the F₁-score.

Evaluation for Sub-Task 2

Sub-Task 2 is evaluated as a token classification problem with BIO-tags using the F₁-score.

Evaluation for Sub-Task 3

Sub-Task 3 is evaluated as a ternary classification task using the F₁-score.

Task Committee

Tim Hagen

University of Kassel and hessian.AI

Martin Potthast

University of Kassel, hessian.AI, and ScaDS.AI