Image Retrieval/Generation for Arguments 2025

Introduction

Images are a powerful tool for conveying meaning and enhancing understanding. Arguments, on the other hand, are often perceived as formal and logical. In this shared task, we aim to combine these two areas and explore how images can effectively illustrate key aspects of arguments, enhancing their clarity and impact.

Synopsis

  • Task: Given an argument, find (retrieve or generate) images that help to convey central aspects of the argument's claim.
  • Communication: [mailing lists: participants, organizers]
  • Data: [download]
  • Submission: [validator]
Register for participation Join the Touché mailing list

An argument typically consists of a claim supported by one or more premises, which provide evidence for its validity. To maintain brevity in our data creation process, we designed arguments to be concise. Consequently, each argument in our dataset includes only a single claim without supporting premises.

Example image for the argument "Boxing leads to serious injuries":

Boxer gets a hard hit
Source: Sweating fighter is punched in the face - gettyimages

The final dataset is now relased and can be found here.

Important Dates

Subscribe to the Touché mailing list to receive notifications.

  • 2024-11: CLEF Registration opened [register]
  • 2025-05-10: Approaches submission deadline
  • 2025-05-30: Participant paper submission
  • 2025-06-10: Peer review notification
  • 2025-07-07: Camera-ready participant papers submission
  • 2025-09: CLEF Conference in Madrid and Touché Workshop

All deadlines are 23:59 CEST (UTC+2).

Data

The latest version of the dataset is on Zenodo. The arguments.xml contains the claims for which images should be found. The dataset also contains the image collection for retrieval. Participants who wish to generate images will also have access to a Stable Diffusion API. For access, please leave us a message.

Submission

Our submission system in TIRA will open soon. You can use our [validator] to check whether your submission file is valid. Each team can submit up to 5 runs (as docker image or results file).

For a run with retrieved images, submit a JSONL file. Each line in the JSONL file should be in the following JSON format:

  • argument_id: The ID of the argument in the arguments.xml file in the dataset for which the image was retrieved.
  • method: The string "retrieval".
  • image_id: The image's ID (name of the image's directory in the dataset).
  • rank: The rank of the image in your result list for the argument (starting at 1 per argument, up to 10).
  • tag: A tag that identifies your group and the method you used to produce the run.

Example JSON line (click to see)
{
    "argument_id": "1-5",
    "method": "retrieval",
    "image_id": "I002e616104f6ec04fd1a24d5",
    "rank": 1,
    "tag": "touche organizers - example submission for image retrieval; manual selection of images"
}

For a run with generated images, submit a ZIP file which contains a results.jsonl file and the generated images in a directory called generated_images. Each line in the results.jsonl file should be in the following JSON format:

  • argument_id: The ID of the argument in the arguments.xml file in the dataset for which the image was generated.
  • method: The string "generation".
  • prompt: The prompt you used to generate the image.
  • image: Filename of the image in the generated_images directory.
  • rank: The rank of the image in your result list for the argument (starting at 1 per argument, up to 10).
  • tag: A tag that identifies your group and the method you used to produce the run.

Example JSON line (click to see)
{
    "argument_id": "1-5",
    "method": "generation",
    "prompt": "Worker loosing job due to automation",
    "image": "1-5-1.jpg",
    "rank": 1,
    "tag": "touche organizers - example submission for image generation; manual prompt engineering"
}

Evaluation

Runs will be assessed manually using top-k pooling. We identified two key aspects for each argument and will check for submitted images whether these are represented in the images. Runs will be compared using NDCG

.

Task Committee