Evaluation Results

#5
by PSM272 - opened

Are the evaluation results correct? It seems to be pretty good to get 85 MMLU, especially for a 14B model…

CausalLM org

We use opencompass with vllm for evaluation, with parameters of sample ON and temp=0.3. We expect a better result with HF transformers, as the accelerate kernels may lead to significant degradation.

MMLU is NOT a good agent for parameter count. And this is not our first model to reach a 85+ MMLU.

It should be contamination-free, but we won't prove it yet, as we did it last time and people are still arguing regardless of which.

See also: https://huggingface.co/CausalLM/34b-beta/discussions/5

JosephusCheung changed discussion status to closed

Sign up or log in to comment