Image Retrieval/Generation for Arguments 2025

Synopsis
Important Dates
Task
Data
Submission
Evaluation
Task Committee

Synopsis

Task: Given an argument, find (retrieve or generate) images that help to convey central aspects of the argument's claim.
Communication: [mailing lists: participants, organizers]
Data: [download]
Submission: [validator]

Important Dates

Subscribe to the Touché mailing list to receive notifications.

2024-11: CLEF Registration opened [register]
2025-05-23: Approaches submission deadline [submit]
2025-05-30: Participant paper submission [paper template + submission instructions]
2025-06-10: Peer review notification
2025-07-07: Camera-ready participant papers submission
2025-09: CLEF Conference in Madrid and Touché Workshop

All deadlines are 23:59 CEST (UTC+2).

Task

Images are a powerful tool for conveying meaning and enhancing understanding. Arguments, on the other hand, are often perceived as formal and logical. In this shared task, we aim to combine these two areas and explore how images can effectively illustrate key aspects of arguments, enhancing their clarity and impact.

An argument typically consists of a claim supported by one or more premises, which provide evidence for its validity. To maintain brevity in our data creation process, we designed arguments to be concise. Consequently, each argument in our dataset includes only a single claim without supporting premises.

Boxer gets a hard hit — Source: Sweating fighter is punched in the face - gettyimages

The final dataset is now relased and can be found here.

Data

The latest version of the dataset is on Zenodo. The arguments.xml contains the claims for which images should be found. The dataset also contains the image collection for retrieval. Participants who wish to generate images will also have access to a Stable Diffusion API. For access, please leave us a message.

Submission

You can submit your files via Tira The dataset for which you are submitting is called Image Retrieval/Generation for Arguments 2025. You can use our [validator] to check whether your submission file is valid. Each team can submit up to 5 runs (as docker image or results file). Please submit five ranked images for each argument.

For a run with retrieved images, submit a JSONL file. Each line in the JSONL file should be in the following JSON format:

argument_id: The ID of the argument in the arguments.xml file in the dataset for which the image was retrieved.
method: The string "retrieval".
image_id: The image's ID (name of the image's directory in the dataset).
rank: The rank of the image in your result list for the argument (starting at 1 per argument, up to 5).
tag: A tag that identifies your group and the method you used to produce the run.

Example JSON line (click to see)

{
    "argument_id": "1-5",
    "method": "retrieval",
    "image_id": "I002e616104f6ec04fd1a24d5",
    "rank": 1,
    "tag": "touche organizers - example submission for image retrieval; manual selection of images"
}

For a run with generated images, submit a ZIP file which contains a results.jsonl file and the generated images in a directory called generated_images. Each line in the results.jsonl file should be in the following JSON format:

argument_id: The ID of the argument in the arguments.xml file in the dataset for which the image was generated.
method: The string "generation".
prompt: The prompt you used to generate the image.
image_id: Filename of the image in the generated_images directory.
rank: The rank of the image in your result list for the argument (starting at 1 per argument, up to 5).
tag: A tag that identifies your group and the method you used to produce the run.

Example JSON line (click to see)

{
    "argument_id": "1-5",
    "method": "generation",
    "prompt": "Worker loosing job due to automation",
    "image_id": "1-5-1.jpg",
    "rank": 1,
    "tag": "touche organizers - example submission for image generation; manual prompt engineering"
}

Evaluation

Runs will be assessed manually using top-k pooling. We identified two key aspects for each argument and will check for submitted images whether these are represented in the images. Runs will be compared using NDCG.