Fallacy Detection 2026
Synopsis
The task consists of three sub-tasks:
- Sub-Task 1 — Fallacy Detection: Given an argument, determine whether it is fallacious.
- Sub-Task 2 — Fallacy Classification: Given a fallacious argument, identify the specific type of fallacy.
- Sub-Task 3 — Argument Scheme Classification: Given a non-fallacious argument, determine which argumentation scheme it follows.
- Communication: [mailing lists: participants, organizers]
Important Dates
| Event | Date |
|---|---|
| Participant Registration Opens | November 17, 2025 |
| Participant Registration Closes | April 23, 2026 |
| Run Submission | |
| Notebook Paper Submission | May 28, 2026 |
| Notebook Paper Notification | June 30, 2026 |
| Notebook Camera Ready | July 6, 2026 |
All times are in Anywhere on Earth (AoE) timezone. See also the CLEF 2026 homepage.
Subscribe to the Touché mailing list to receive notifications.
Task
Participants are given a short source text together with a set of extracted arguments derived from that text. The arguments constitute the main units of analysis.
In Sub-Task 1, participants must determine whether a given argument is fallacious or not. In Sub-Task 2, it is known that the argument is fallacious, and participants must identify the specific type of fallacy. In Sub-Task 3, it is known that the argument is non-fallacious, and participants must identify the argument scheme it follows.
Participants can choose to participate in any one, any two, or all three sub-tasks.
This first edition of the task focuses on the most frequently occurring types of fallacies and
argument schemes. Rather than predicting individual argumentation schemes directly, we adopt the
two orthogonal dimensions (argument_goal and argument_basis) from
Macagno’s framework, which group many diverse schemes by shared
underlying properties (see Remarks).
Data
The data is taken from Sahai et al. and is called informal_fallacies. It is the sole source of fallacies used in this task. The dataset is going to be released soon.
Each data entry contains the following fields:
| Field | Description |
|---|---|
id | Unique identifier for the entry. |
text_raw | The original Reddit comment. |
text_raw_parent | The parent comment of text_raw. |
text_raw_title | Title of the thread text_raw originates from. |
text_base | Self-contained rewrite of text_raw that incorporates relevant information from text_raw_parent and text_raw_title. |
argument_base | {claim, supports} extracted from text_base, making the argument structure explicit. |
text_enhanced | Rewrite of text_base that explicitly surfaces fallacy and scheme information. |
argument_enhanced | {claim, supports} extracted from text_enhanced. |
fallacy_exists | Binary label (0/1) indicating whether the entry contains a fallacy — used in Sub-Task 1 (Fallacy Detection). |
fallacy_type | The specific type of fallacy — used in Sub-Task 2 (Fallacy Classification). |
resembles_fallacy | The fallacy type this argument most closely resembles. For fallacious entries, this equals fallacy_type. For non-fallacious entries, it identifies which fallacy type the argument could plausibly be confused with — i.e., the argument exhibits a similar reasoning pattern but does not commit the fallacy. |
classification | {argument_goal, argument_basis} — used in Sub-Task 3 (Argument Scheme Classification). |
Remarks
Claim and Supports
An argument generally consists of a conclusion and one or more premises. Because real-world arguments are informal, we use the term support rather than premise to emphasize that a support may be partial and may not fully justify the claim.
Resembles Fallacy
Each non-fallacious entry in the dataset was selected because its reasoning pattern resembles a
specific fallacy type. For example, a non-fallacious entry with
resembles_fallacy: authority contains a legitimate appeal to an authority figure —
structurally similar to a fallacious appeal to authority, but without committing the fallacy.
This pairing is useful for Sub-Task 1 (Fallacy Detection), as it ensures that systems
must distinguish genuinely fallacious arguments from superficially similar but valid ones.
Base vs. Enhanced Fields
Arguments on Reddit are often unclear and rely on background assumptions. The text_base
field addresses this by producing a self-contained version of the argument that integrates context
from the parent comment and thread title.
Fallacy detection and argument scheme detection are both difficult tasks. To make the data more
tractable, we additionally provide the enhanced fields (text_enhanced,
argument_enhanced), which have been rewritten with knowledge of the corresponding
scheme and fallacy labels.
Participants may choose which fields to use. Systems that rely only on the original (non-enhanced) data are evaluated separately from systems that use the enhanced fields.
Argument Scheme Classification (Sub-Task 3)
The argument_goal and argument_basis dimensions are drawn from
Macagno’s classification of argumentation schemes. These are not argumentation schemes themselves
— rather, they are two orthogonal dimensions that group many diverse schemes together
according to shared underlying properties. We adopt them as the two fundamental axes for this task.
argument_goal — the purpose of the argument:
- Practical — arguments aimed at assessing the desirability of a course of action.
- Epistemic — arguments aimed at establishing the acceptability of a judgment about a state of affairs.
argument_basis — the kind of means the argument uses to support the claim:
- Internal — based on properties of the subject matter itself (e.g., causes, consequences, definitions, values).
- External — based on the source of the statement (e.g., authority, expertise, popular opinion, ad hominem).
Submission
We ask participants to use TIRA for run submissions.
The submissions for this task must be made as a run submission, i.e., participants process the data on their side and submit their predictions.
Output Format
For each sub-task, participants submit a separate JSONL file (one JSON object per line). Each object must contain the following fields:
| Field | Description |
|---|---|
task | The sub-task the prediction corresponds to. One of: fallacy_detection, fallacy_classification, scheme_classification. |
id | The ID of the argument being classified. |
label | The label predicted by your system (see allowed labels per sub-task below). |
tag | Either base or enhanced, indicating whether the system used only the original (non-enhanced) fields or also the enhanced fields. |
system_description | Optional. A short free-text description of the approach, for easier identification of runs. Can be left empty. |
Allowed Labels per Sub-Task
Sub-Task 1 — fallacy_detection:
fallacynon-fallacy
Sub-Task 2 — fallacy_classification:
authorityblack-whitehasty_generalizationnaturalpopulationslippery_slopetraditionworse_problems
Sub-Task 3 — scheme_classification:
Labels are the combination of argument_goal and argument_basis, joined by a hyphen:
practical-internalpractical-externalepistemic-internalepistemic-external
Examples
Run file for Sub-Task 1 (Fallacy Detection):
{"task": "fallacy_detection", "id": "informal_fallacies-test-171", "label": "fallacy", "tag": "base", "system_description": "RoBERTa fine-tuned on text_base"}
{"task": "fallacy_detection", "id": "informal_fallacies-test-172", "label": "non-fallacy", "tag": "base", "system_description": "RoBERTa fine-tuned on text_base"}
Run file for Sub-Task 2 (Fallacy Classification):
{"task": "fallacy_classification", "id": "informal_fallacies-test-204", "label": "hasty_generalization", "tag": "enhanced", "system_description": ""}
{"task": "fallacy_classification", "id": "informal_fallacies-test-205", "label": "authority", "tag": "enhanced", "system_description": ""}
Run file for Sub-Task 3 (Scheme Classification):
{"task": "scheme_classification", "id": "informal_fallacies-test-309", "label": "practical-internal", "tag": "base", "system_description": ""}
{"task": "scheme_classification", "id": "informal_fallacies-test-412", "label": "epistemic-external", "tag": "base", "system_description": ""}
If you have any questions, please do not hesitate to contact us.
Evaluation
Evaluation uses a held-out test set for each subtask and standard evaluation metrics: precision, recall, and F1-score.
Related Work
- Fabrizio Macagno, Argumentation profiles and the manipulation of common ground. The arguments of populist leaders on Twitter, Journal of Pragmatics, Volume 191, 2022, Pages 67-82, ISSN 0378-2166.
- Saumya Sahai, Oana Balalau, and Roxana Horincar, Breaking Down the Invisible Wall of Informal Fallacies in Online Discussions, Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), August 2021, pp. 644–657, doi:10.18653/v1/2021.acl-long.53.




