Ifeval score 55 on inst-level loose-accuracy.
#70
by
Jamesunnc
- opened
As per title,
I was running ifeval (https://huggingface.co/datasets/google/IFEval) on the qwq-32b model. I use opencompass to do the evaluation. I got something 55 rather than 83.9 from the offcical report. Do you have any ideas on why. BTW, I did get 90 on the llama 3.1 70b model with exactly same script. So I don't think opencompass script is the reason.
@qwen-llm