Evaluation Results
#5
by
PSM272
- opened
Are the evaluation results correct? It seems to be pretty good to get 85 MMLU, especially for a 14B model…
We use opencompass with vllm for evaluation, with parameters of sample ON and temp=0.3. We expect a better result with HF transformers, as the accelerate kernels may lead to significant degradation.
MMLU is NOT a good agent for parameter count. And this is not our first model to reach a 85+ MMLU.
It should be contamination-free, but we won't prove it yet, as we did it last time and people are still arguing regardless of which.
See also: https://huggingface.co/CausalLM/34b-beta/discussions/5
JosephusCheung
changed discussion status to
closed