Blanca's picture
Update content.py
0e7f720 verified
TITLE = """
<h1 align="center" id="space-title">Critical Questions Generation Leaderboard</h1>
"""
INTRODUCTION_TEXT = """
# Task
Critical Questions Generation is the task of automatically generating questions that can unmask the assumptions held by the premises of an argumentative text.
This leaderboard aims at benchmarking the capacity of language technology systems to create Critical Questions (CQs). That is, questions that should be asked in order to judge if an argument is acceptable or fallacious.
The task consists on generating 3 Useful Critical Questions per argumentative text.
All details on the task, the dataset, and the evaluation can be found in the paper [Benchmarking Critical Questions Generation: A Challenging Reasoning Task for Large Language Models](https://arxiv.org/abs/2505.11341) or in the [Shared Task](https://hitz-zentroa.github.io/shared-task-critical-questions-generation/)
"""
DATA_TEXT = """
## Data
The [CQs-Gen dataset](https://huggingface.co/datasets/HiTZ/CQs-Gen) gathers 220 interventions of real debates. Divided between:
- `validation`: which contains 186 interventions and can be used for training or validation, as it has ~25 reference questions per intervention already evaluated accoding to their usefulness (either Useful, Unhelpful or Invalid).
- `test`: which contains 34 interventions. The reference questions of this set (~70) are kept private to avoid data contamination. The questions generated using the test set are what should be submitted to this leaderboard.
## Evaluation
The evaluation of each question is computer by comparing each of the 3 newly generated question to the reference questions of the test set using Semantic Text Similarity, and inheriting the label of the most similar reference given the threshold of 0.65. Questions where no reference is found are considered Invalid. See the evaluation function [here](https://huggingface.co/spaces/HiTZ/Critical_Questions_Leaderboard/blob/main/app.py#L141), or find more details in the [paper](https://arxiv.org/abs/2505.11341).
## Leaderboard
"""
SUBMISSION_TEXT = """
## Submissions
Results can be submitted for the test set only.
We expect submissions to be json files with the following format:
```json
{
"CLINTON_1_1": {
"intervention_id": "CLINTON_1_1",
"intervention": "CLINTON: \"The central question in this election is really what kind of country we want to be and what kind of future we 'll build together\nToday is my granddaughter 's second birthday\nI think about this a lot\nwe have to build an economy that works for everyone , not just those at the top\nwe need new jobs , good jobs , with rising incomes\nI want us to invest in you\nI want us to invest in your future\njobs in infrastructure , in advanced manufacturing , innovation and technology , clean , renewable energy , and small business\nmost of the new jobs will come from small business\nWe also have to make the economy fairer\nThat starts with raising the national minimum wage and also guarantee , finally , equal pay for women 's work\nI also want to see more companies do profit-sharing\"",
"dataset": "US2016",
"cqs": [
{
"id": 0,
"cq": "What does the author mean by \"build an economy that works for everyone, not just those at the top\"?"
},
{
"id": 1,
"cq": "What is the author's definition of \"new jobs\" and \"good jobs\"?"
},
{
"id": 2,
"cq": "How will the author's plan to \"make the economy fairer\" benefit the working class?"
}
]
},
...
}
```
After clicking 'Submit Eval' wait for a couple of minutes before trying to refresh.
If you find any issues, please email blanca.calvo@ehu.eus
"""
CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
CITATION_BUTTON_TEXT = r"""@misc{figueras2025benchmarkingcriticalquestionsgeneration,
title={Benchmarking Critical Questions Generation: A Challenging Reasoning Task for Large Language Models},
author={Calvo Figueras, Banca and Rodrigo Agerri},
year={2025},
booktitle={2025 Conference on Empirical Methods in Natural Language Processing (EMNLP 2025)},
organization={Association for Computational Linguistics (ACL)},
url={https://arxiv.org/abs/2505.11341},
}"""
def format_error(msg):
return f"<p style='color: red; font-size: 20px; text-align: center;'>{msg}</p>"
def format_warning(msg):
return f"<p style='color: orange; font-size: 20px; text-align: center;'>{msg}</p>"
def format_log(msg):
return f"<p style='color: green; font-size: 20px; text-align: center;'>{msg}</p>"
def model_hyperlink(link, model_name):
return f'<a target="_blank" href="{link}" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">{model_name}</a>'