Terry Zhuo
commited on
Commit
•
cd5ba8d
1
Parent(s):
52ee73d
fix: add more notes
Browse files- app.py +1 -1
- src/text_content.py +1 -1
app.py
CHANGED
@@ -226,7 +226,7 @@ with demo:
|
|
226 |
- <u>Complete</u>: Code Completion based on the (verbose) structured docstring. This variant tests if the models are good at coding.
|
227 |
- <u>Instruct</u> (🔥Vibe Check🔥): Code Generation based on the (less verbose) NL-oriented instructions. This variant tests if the models are really capable enough to understand human intents to code.
|
228 |
- `complete` and `instruct` represent the calibrated Pass@1 score on the BigCodeBench benchmark variants.
|
229 |
-
- `elo_mle` represents the task-level Bootstrap of Maximum Likelihood Elo rating on `BigCodeBench-Complete
|
230 |
- `size` is the amount of activated model weight during inference.
|
231 |
- Model providers have the responsibility to avoid data contamination. Models trained on close data can be affected by contamination.
|
232 |
- For more details check the 📝 About section.
|
|
|
226 |
- <u>Complete</u>: Code Completion based on the (verbose) structured docstring. This variant tests if the models are good at coding.
|
227 |
- <u>Instruct</u> (🔥Vibe Check🔥): Code Generation based on the (less verbose) NL-oriented instructions. This variant tests if the models are really capable enough to understand human intents to code.
|
228 |
- `complete` and `instruct` represent the calibrated Pass@1 score on the BigCodeBench benchmark variants.
|
229 |
+
- `elo_mle` represents the task-level Bootstrap of Maximum Likelihood Elo rating on `BigCodeBench-Complete`, which starts from 1000 and is boostrapped 500 times.
|
230 |
- `size` is the amount of activated model weight during inference.
|
231 |
- Model providers have the responsibility to avoid data contamination. Models trained on close data can be affected by contamination.
|
232 |
- For more details check the 📝 About section.
|
src/text_content.py
CHANGED
@@ -42,7 +42,7 @@ pip install bigcodebench[generate] --upgrade
|
|
42 |
|
43 |
### Scoring and Rankings
|
44 |
- Models are ranked according to Pass@1 using greedy decoding. Setup details can be found <a href="https://github.com/bigcode-project/bigcodebench/blob/main/bigcodebench/generate.py">here</a>.
|
45 |
-
- The code to compute Elo rating is based on [Chatbot Arena Notebook](https://colab.research.google.com/drive/1RAWb22-PFNI-X1gPVzc927SGUdfr6nsR#scrollTo=JdiJbB6pZB1B&line=2&uniqifier=1). We only compute the Elo rating for the `BigCodeBench-Complete` variant.
|
46 |
|
47 |
### Contact
|
48 |
If you have any questions, feel free to reach out to us at [[email protected]](mailto:[email protected]) or [[email protected]](mailto:[email protected])
|
|
|
42 |
|
43 |
### Scoring and Rankings
|
44 |
- Models are ranked according to Pass@1 using greedy decoding. Setup details can be found <a href="https://github.com/bigcode-project/bigcodebench/blob/main/bigcodebench/generate.py">here</a>.
|
45 |
+
- The code to compute Elo rating is [here](https://github.com/bigcode-project/bigcodebench/blob/main/analysis/get_results.py), which is based on [Chatbot Arena Notebook](https://colab.research.google.com/drive/1RAWb22-PFNI-X1gPVzc927SGUdfr6nsR#scrollTo=JdiJbB6pZB1B&line=2&uniqifier=1). We only compute the Elo rating for the `BigCodeBench-Complete` variant.
|
46 |
|
47 |
### Contact
|
48 |
If you have any questions, feel free to reach out to us at [[email protected]](mailto:[email protected]) or [[email protected]](mailto:[email protected])
|