Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Commit
•
5822157
1
Parent(s):
39f3e26
Update content.py
Browse files- content.py +10 -1
content.py
CHANGED
@@ -21,7 +21,15 @@ SUBMISSION_TEXT = """
|
|
21 |
## Submissions
|
22 |
Results can be submitted for both validation and test. Scores are expressed as the percentage of correct answers for a given split.
|
23 |
|
24 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
25 |
|
26 |
We expect submissions to be json-line files with the following format. The first two fields are mandatory, `reasoning_trace` is optionnal:
|
27 |
```
|
@@ -29,6 +37,7 @@ We expect submissions to be json-line files with the following format. The first
|
|
29 |
{"task_id": "task_id_2", "model_answer": "Answer 2 from your model", "reasoning_trace": "The different steps by which your model reached answer 2"}
|
30 |
```
|
31 |
|
|
|
32 |
"""
|
33 |
|
34 |
CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
|
|
|
21 |
## Submissions
|
22 |
Results can be submitted for both validation and test. Scores are expressed as the percentage of correct answers for a given split.
|
23 |
|
24 |
+
Each question calls for an answer that is either a string (one or a few words), a number, or a comma separated list of strings or floats, unless specified otherwise. There is only one correct answer.
|
25 |
+
Hence, evaluation is done via quasi exact match between a model’s answer and the ground truth (up to some normalization that is tied to the “type” of the ground truth).
|
26 |
+
|
27 |
+
In our evaluation, we use a system prompt to instruct the model about the required format:
|
28 |
+
```
|
29 |
+
You are a general AI assistant. I will ask you a question. Report your thoughts, and finish your answer with the following template: FINAL ANSWER: [YOUR FINAL ANSWER]. YOUR FINAL ANSWER should be a number OR as few words as possible OR a comma separated list of numbers and/or strings. If you are asked for a number, don't use comma to write your number neither use units such as $ or percent sign unless specified otherwise. If you are asked for a string, don't use articles, neither abbreviations (e.g. for cities), and write the digits in plain text unless specified otherwise. If you are asked for a comma separated list, apply the above rules depending of whether the element to be put in the list is a number or a string.
|
30 |
+
```
|
31 |
+
We advise you to use the system prompt provided in the paper to ensure your agents answer using the correct and expected format. In practice, GPT4 level models easily follow it.
|
32 |
+
|
33 |
|
34 |
We expect submissions to be json-line files with the following format. The first two fields are mandatory, `reasoning_trace` is optionnal:
|
35 |
```
|
|
|
37 |
{"task_id": "task_id_2", "model_answer": "Answer 2 from your model", "reasoning_trace": "The different steps by which your model reached answer 2"}
|
38 |
```
|
39 |
|
40 |
+
Our scoring function can be found [here](https://huggingface.co/spaces/gaia-benchmark/leaderboard/blob/main/scorer.py).
|
41 |
"""
|
42 |
|
43 |
CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
|