How does LiveCodeBench test?

#58

by cizhenshi - opened 26 days ago

26 days ago

I would like to reproduce the LiveCodeBench results for QwQ-32B. Could you please tell me which code repository you used and what configuration was used for the testing?

wangxingjun778

15 days ago

•

edited 15 days ago

Please refer to the EvalScope: https://github.com/modelscope/evalscope
The LiveCodeBench has been supported for QwQ-32B :)

cizhenshi

15 days ago

Thanks for your reply!
I want to know what kind of hyperparameter configuration you used to evaluate LiveCodeBench?
The results measured using the official LiveCodeBench configuration are lower than what you reported.

wangxingjun778

15 days ago

For specific evaluating steps, pls refer to the best practice: https://evalscope.readthedocs.io/en/latest/best_practice/eval_qwq.html#evaluating-code-capability

We conducted the evaluation of QwQ-32B based on the official code implementation of LiveCodeBench.
Indeed, as you mentioned, our results are slightly lower by 1pt compared to those provided in the technical report for QwQ-32B.
We speculate that this may be related to factors such as prompt construction and inference parameter settings.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment