Evaluation Results

by PSM24 - opened Aug 28, 2024

Discussion

PSM24

Aug 28, 2024

Are the evaluation results correct? It seems to be pretty good to get 85 MMLU, especially for a 14B model…

JosephusCheung

CausalLM org Aug 28, 2024

We use opencompass with vllm for evaluation, with parameters of sample ON and temp=0.3. We expect a better result with HF transformers, as the accelerate kernels may lead to significant degradation.

MMLU is NOT a good agent for parameter count. And this is not our first model to reach a 85+ MMLU.

It should be contamination-free, but we won't prove it yet, as we did it last time and people are still arguing regardless of which.

JosephusCheung changed discussion status to closed Aug 28, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment