Image Retrieval/Generation for Arguments 2024

Introduction

Let's look at the following argument:

Claim: Boxing is a dangerous sport!

Premise: Boxing can lead to serious injuries.

boxer gets a har hit

Source: Sweating fighter is punched in the face - gettyimages

Chances are, you looked at the photo first, before you even read the whole argument. We can almost feel the pain of the blow, and hear the sound of the punch, making the premise much more convincing. This is the power of images, which appeal to our visual senses and emotions, making them more memorable than words. There is a difference between written language and images. Written language provides clear but limited information, while images provide more information than written words, but are not as precise. For example, in the picture above, we can see the color of the boxing glove and the appearance of the boxer, which are not mentioned by the words in the premise.

Synopsis

Task: Given an argument, find images that help to convey the argument's premise.

For this task, an argument consists of one claim and a premise. In addition, we provide the argument's topic and the premise's type. Premises are either facts from a study or anecdotal evidence and are labeled accordingly so that participants can use different approaches for these types.

"Convey" is meant in a general way; it can show what is described in the premise, but it may also show a generalization (e.g., a meme image that illustrates a related abstract concept) or a specialization (e.g., a concrete example). (e.g., a concrete example). The image can also refer to signs and symbols.

Interested?

Register now

If you would like to use the Stable Diffusion API for this task, just contact us and we will provide you with the details.

Data

The current version dataset can be found here. Besides the arguments, the corpus also contains a crawl of about 9000 images (and associated web pages) as document collection. For the retrieved images we also provide additional information such as an automatically generated image caption. For participants favoring image generation, we provide access to a Stable Diffusion API . Contact us if you want access.

Submission

We allow three kinds of submissions.

  1. Retrieval. Like in the last years, participants can retrieve suitable images from a focused crawl, where we also provide automatically recognized text from the image (OCR) and text from web pages that contain the image.
  2. Prompted Generation. Following the idea of the infinite index, participants can submit prompts for the Stable Diffusion image generator.
  3. Direct. Participants can employ other reproducible methods for generating images and directly submit them. This includes chart generators, which can generate a bar chart from given numbers in the premise. Also, one can use headline generators to transform the premise into a headline.

Images alone can be ambiguous and difficult to understand without context, e.g. if they refer to symbolism. That's why we offer the option to submit a rationale along with the image. The rationale is a piece of text that helps us understand the image. For example, it could be a caption or contextual information about the image. The image and rationale will be evaluated together to see how this combination conveys the premise.

Submission - Format

The submission is done through Tira. Each team can submit as many runs as it wants. A run is either a docker image that generates the needed results or a file that contains the results directly. The exact specifications of this file depend on whether you have chosen the retrieval or the generation approach. If you want to try both methods, please submit separate runs.

            Image Retrieval
        
            If you have chosen the retrieval approach, please submit your results in a file called "results.jsonl". 
        
            Each JSON object in your "results.jsonl" file should have the following keys:
            
            argument_id - id of the argument in the arguments.xml file in the released dataset
            method - retrieval
            image_id - the image's ID - it corresponds to the name of the image's directory in the released dataset
            rationale - additional info or caption for the image to understand how it conveys the premise (optional)
            rank - specifies the preference of your image assignemnt (more below) - 1 is highest
            tag - tag defined by you and your group, identifies your group and the method you used to obtain the results
        
            An example submission for argument "65302-a-2" would look like the following:
        
            {
            "argument_id": "65302-a-2",
            "method": "retrieval",
            "image_id": "Iffdea3cd664722c736d7d667",
            "rationale": "space is the final frontier",
            "rank": 1,
            "tag": "touche organizers - example submission for image retrieval; manual selection of images"
            }
        
            Image Generation
        
            If you are using image generation, submit a file called "generation.zip", which should contain a JSONL file called "results.jsonl" 
            and a directory called "generated_images "containing the generated images.
        
            Please use the following keys for your JSON Objects in the JSONL file:
        
            argument_id - id of the argument in the arguments.xml file in the released dataset
            method - generation 
            prompt - the prompt that you have used to generate the image
            image : - name of the generated image, which can be found in the generated_images directory
            rationale - additional info or caption for the image to understand how it conveys the premise (optional)
            rank - specifies the preference of your image assignment (more below)  - 1 is highest
            tag - tag defined by you and your group, identifies your group and the method you used to obtain the results
        
            An example looks like this:
        
            {
                "argument_id": "65302-a-2",
                "method": "generation",
                "prompt": "cat looking into the stars",
                "image_name": "space-pic1.jpg",
                "rationale": "space is fascinating",
                "rank": 1,
                "tag": "touche organizers - example submission for image generation; manual prompt engineering"
            }
                    
            Therefore the corresponding "generation.zip for this submission would have the following structure:
        
            - results.jsonl (file)
            - generated_images (directory)
              - space-pic1.jpg (file)

            
You can find more information about JSON lines here and a results.jsonl example here. You can assign up to 10 images to the same argument ID. The rank key is used to determine the preference order of your images for the corresponding argument, where 1 is the most relevant image. This means that if you submit for example 5 images for one argument you need to use the rank values from 1 till 5. Also this means that a run containing multiple image assignments for an argument with the same rank will not be valid. Multiple image asssignments to the same argument have also been done in the linked results.jsonl example.

Important Dates

  • Dec. 18, 2023: CLEF Registration opens. [register]
  • May 6, 2024: Approaches submission deadline.
  • May 31, 2024: Participant paper submission.
  • June 21, 2024: Peer review notification.
  • July 8, 2024: Camera-ready participant papers submission.
  • Sep. 9-12, 2024: CLEF Conference in Grenoble and Touché Workshop.

All deadlines are 23:59 anywhere on earth (UTC-12).

Evaluation

  • The premises are usually formulated in general terms. This allows for some interpretation and choices of conveying images.
  • If no rationale is given, we will use the prompt (generate) or automatically created (retrieval) caption as the default rationale.
  • Try to avoid are very generic images, which get only only relevant though the rationale and show and could be used for many subjects will not get as many points as images that are premise specific. Also, if the generation approach is chosen, the image should not be too unrealistic, e.g., a person should not have four legs, although minor imperfections are are fine. See this section for further information.

Examples

Example 1

Topic Boxing should be banned
Premise Boxing can cause serious injury to your body.
Stance Pro
Claim Boxing is dangerous.
Type Anecdotal

Submissions:

Sweating fighter is punched in the face
Rationale Heavy blows to the face are common in boxing.
Source retrieved (Sweating fighter is punched in the face - gettyimages)
generated by Stable Diffusion.
Rationale Concussion caused by heavy blows in boxing

Source retrieved (https://www.shutterstock.com/de/image-illustration/headache-5407147)
generated by Stable Diffusion.
Rationale It causes a lot of pain and hurt.

Source generated

Example 2

Topic This house believes that democratic governments should require voters to present photo identification at the polling station
Premise People will forget their IDs and cannot use their democratic right for voting
Stance Contra
Claim Use of Photo ID will result in exclusion of voters!
Type Anecdotal

Submissions:

reminder not to forget photo id
Rationale People need to be reminded not to forget their ID.
Source retrieved (Geoffrey Swaine/Shutterstock)
yound man at voting station
Rationale Young Man in the voting station forgot his Photo ID.
Source generated
photo id
Rationale Photo IDs can be forgotten at home.
Source generated

Example 3

Topic Should performance-enhancing drugs be accepted in sports?
Premise Performance-enhancing drugs create inequality in sports and undermine the essence of fair competition.
Stance Contra
Claim Performance-enhancing drugs should not be accepted in sport.
Type Anecdotal

Submissions:

drugs and medals
Rationale With performance-enhancing drugs, you can win medals you don't deserve.
Source retrieved (https://sportsanddrugs.procon.org/)
hard training athlete
Rationale Hard training athletes cannot win if people are using performance enhancing drugs.

Source generated
hard training athlete
Rationale Hard training athletes cannot win if people are using performance enhancing drugs.

Source generated

A comment on using images

As we can see from these examples, a rationale can be critical to associating an image with a premise. Try to avoid images that are too generic and only relevant because of the rationale. Look at the photo of the surprised woman. It is not possible to make a connection to the general topic of voting and photo IDs from the image alone. The connection to the issue is only made through the rationale. It gives the impression that this image could be used for any topic, such as a young woman being surprised about the dangers of boxing. Such images receive a lower relevance rating. The same goes for obviously incorrectly generated images, such as the bar through the athlete's head.

Example 1

Topic This house believes that democratic governments should require voters to present photo identification at the polling station
Premise People will forget their IDs and cannot use their democratic right for voting
Stance Contra
Claim Use of Photo ID will result in exclusion of voters!
Type Anecdotal

Submission:

image of a shocked young woman
Rationale Young Woman forgot her Photo ID.
Source retrieved (Image by benzoix on Freepik)

Example 2

Topic Should performance-enhancing drugs be accepted in sports?
Premise Performance-enhancing drugs create inequality in sports and undermine the essence of fair competition.
Stance Contra
Claim Performance-enhancing drugs should not be accepted in sport.
Type Anecdotal

Submissions

athlete with unrealistic proportions
Rationale Hard training athletes cannot win if people are using performance enhancing drugs.

Source generated

Task Committee