Synopsis

The task consists of three sub-tasks:

  • Sub-Task 1 — Fallacy Detection: Given an argument, determine whether it is fallacious.
  • Sub-Task 2 — Fallacy Classification: Given a fallacious argument, identify the specific type of fallacy.
  • Sub-Task 3 — Argument Scheme Classification: Given a non-fallacious argument, determine which argumentation scheme it follows.
  • Communication: [mailing lists: participants, organizers]
Register for participation Join the Touché mailing list

Important Dates

EventDate
Participant Registration OpensNovember 17, 2025
Participant Registration ClosesApril 23, 2026
Run SubmissionMay 7, 2026 — extended to May 21, 2026
Notebook Paper SubmissionMay 28, 2026
Notebook Paper NotificationJune 30, 2026
Notebook Camera ReadyJuly 6, 2026

All times are in Anywhere on Earth (AoE) timezone. See also the CLEF 2026 homepage.

Subscribe to the Touché mailing list to receive notifications.

Task

Participants are given a short source text together with a set of extracted arguments derived from that text. The arguments constitute the main units of analysis.

In Sub-Task 1, participants must determine whether a given argument is fallacious or not. In Sub-Task 2, it is known that the argument is fallacious, and participants must identify the specific type of fallacy. In Sub-Task 3, it is known that the argument is non-fallacious, and participants must identify the argument scheme it follows.

Participants can choose to participate in any one, any two, or all three sub-tasks.

This first edition of the task focuses on the most frequently occurring types of fallacies and argument schemes. Rather than predicting individual argumentation schemes directly, we adopt the two orthogonal dimensions (argument_goal and argument_basis) from Macagno’s framework, which group many diverse schemes by shared underlying properties (see Remarks).

Data

The data is taken from Sahai et al. and is called informal_fallacies. It is the sole source of fallacies used in this task. The dataset is going to be released soon.

Each data entry contains the following fields:

FieldDescription
idUnique identifier for the entry.
text_rawThe original Reddit comment.
text_raw_parentThe parent comment of text_raw.
text_raw_titleTitle of the thread text_raw originates from.
text_baseSelf-contained rewrite of text_raw that incorporates relevant information from text_raw_parent and text_raw_title.
argument_base{claim, supports} extracted from text_base, making the argument structure explicit.
text_enhancedRewrite of text_base that explicitly surfaces fallacy and scheme information.
argument_enhanced{claim, supports} extracted from text_enhanced.
fallacy_existsBinary label (0/1) indicating whether the entry contains a fallacy — used in Sub-Task 1 (Fallacy Detection).
fallacy_typeThe specific type of fallacy — used in Sub-Task 2 (Fallacy Classification).
resembles_fallacyThe fallacy type this argument most closely resembles. For fallacious entries, this equals fallacy_type. For non-fallacious entries, it identifies which fallacy type the argument could plausibly be confused with — i.e., the argument exhibits a similar reasoning pattern but does not commit the fallacy.
classification{argument_goal, argument_basis} — used in Sub-Task 3 (Argument Scheme Classification).

Remarks

Claim and Supports

An argument generally consists of a conclusion and one or more premises. Because real-world arguments are informal, we use the term support rather than premise to emphasize that a support may be partial and may not fully justify the claim.

Resembles Fallacy

Each non-fallacious entry in the dataset was selected because its reasoning pattern resembles a specific fallacy type. For example, a non-fallacious entry with resembles_fallacy: authority contains a legitimate appeal to an authority figure — structurally similar to a fallacious appeal to authority, but without committing the fallacy. This pairing is useful for Sub-Task 1 (Fallacy Detection), as it ensures that systems must distinguish genuinely fallacious arguments from superficially similar but valid ones.

Base vs. Enhanced Fields

Arguments on Reddit are often unclear and rely on background assumptions. The text_base field addresses this by producing a self-contained version of the argument that integrates context from the parent comment and thread title.

Fallacy detection and argument scheme detection are both difficult tasks. To make the data more tractable, we additionally provide the enhanced fields (text_enhanced, argument_enhanced), which have been rewritten with knowledge of the corresponding scheme and fallacy labels.

Participants may choose which fields to use. Systems that rely only on the original (non-enhanced) data are evaluated separately from systems that use the enhanced fields.

Argument Scheme Classification (Sub-Task 3)

The argument_goal and argument_basis dimensions are drawn from Macagno’s classification of argumentation schemes. These are not argumentation schemes themselves — rather, they are two orthogonal dimensions that group many diverse schemes together according to shared underlying properties. We adopt them as the two fundamental axes for this task.

argument_goal — the purpose of the argument:

  • Practical — arguments aimed at assessing the desirability of a course of action.
  • Epistemic — arguments aimed at establishing the acceptability of a judgment about a state of affairs.

argument_basis — the kind of means the argument uses to support the claim:

  • Internal — based on properties of the subject matter itself (e.g., causes, consequences, definitions, values).
  • External — based on the source of the statement (e.g., authority, expertise, popular opinion, ad hominem).

Submission

We ask participants to use TIRA for run submissions.

The submissions for this task must be made as a run submission, i.e., participants process the data on their side and submit their predictions.

Output Format

For each sub-task, participants submit a separate JSONL file (one JSON object per line). Each object must contain the following fields:

FieldDescription
taskThe sub-task the prediction corresponds to. One of: fallacy_detection, fallacy_classification, scheme_classification.
idThe ID of the argument being classified.
labelThe label predicted by your system (see allowed labels per sub-task below).
tagEither base or enhanced, indicating whether the system used only the original (non-enhanced) fields or also the enhanced fields.
system_descriptionOptional. A short free-text description of the approach, for easier identification of runs. Can be left empty.

Allowed Labels per Sub-Task

Sub-Task 1 — fallacy_detection:

  • fallacy
  • non-fallacy

Sub-Task 2 — fallacy_classification:

  • authority
  • black-white
  • hasty_generalization
  • natural
  • population
  • slippery_slope
  • tradition
  • worse_problems

Sub-Task 3 — scheme_classification:

Labels are the combination of argument_goal and argument_basis, joined by a hyphen:

  • practical-internal
  • practical-external
  • epistemic-internal
  • epistemic-external

Examples

Run file for Sub-Task 1 (Fallacy Detection):

{"task": "fallacy_detection", "id": "informal_fallacies-test-171", "label": "fallacy", "tag": "base", "system_description": "RoBERTa fine-tuned on text_base"}
{"task": "fallacy_detection", "id": "informal_fallacies-test-172", "label": "non-fallacy", "tag": "base", "system_description": "RoBERTa fine-tuned on text_base"}

Run file for Sub-Task 2 (Fallacy Classification):

{"task": "fallacy_classification", "id": "informal_fallacies-test-204", "label": "hasty_generalization", "tag": "enhanced", "system_description": ""}
{"task": "fallacy_classification", "id": "informal_fallacies-test-205", "label": "authority", "tag": "enhanced", "system_description": ""}

Run file for Sub-Task 3 (Scheme Classification):

{"task": "scheme_classification", "id": "informal_fallacies-test-309", "label": "practical-internal", "tag": "base", "system_description": ""}
{"task": "scheme_classification", "id": "informal_fallacies-test-412", "label": "epistemic-external", "tag": "base", "system_description": ""}

If you have any questions, please do not hesitate to contact us.

Evaluation

Evaluation uses a held-out test set for each subtask and standard evaluation metrics: precision, recall, and F1-score.

Task Committee