QwenMath-0.5B / README.md
fdyrd's picture
Update README: add acc on GSM8K
c761ff2 verified
metadata
language:
  - en
license: mit
datasets:
  - fdyrd/MATH
base_model:
  - Qwen/Qwen2.5-0.5B
library_name: transformers
tags:
  - text-generation-inference
metrics:
  - accuracy

QwenMath

A generation LLM which can solve math problems.

Training Statistics

training-method: lora
training-time: "5:42"
data-size: 500
epoch: 3
total_flos: "1372250GF"
train_loss: 0.6441
train_samples_per_second: 4.385
train_steps_per_second: 0.544

Validation Set Performance

Dataset used: test split of fdyrd/MATH. Metric: accuracy

Level Algebra Intermediate Algebra Prealgebra Precalculus Number Theory Geometry Counting & Probability Average
Level 1 0.541 : 135 0.192 : 52 0.477 : 86 0.228 : 57 0.467 : 30 0.263 : 38 0.359 : 39 0.361
Level 2 0.323 : 201 0.109 : 128 0.367 : 177 0.044 : 113 0.38 : 92 0.134 : 82 0.248 : 101 0.229
Level 3 0.291 : 261 0.046 : 195 0.308 : 224 0.0 : 127 0.262 : 122 0.088 : 102 0.16 : 100 0.165
Level 4 0.18 : 283 0.024 : 248 0.22 : 191 0.009 : 114 0.169 : 142 0.064 : 125 0.09 : 111 0.108
Level 5 0.088 : 307 0.004 : 280 0.104 : 193 0.0 : 135 0.136 : 154 0.023 : 132 0.065 : 123 0.06
Average 0.285 0.075 0.295 0.056 0.283 0.114 0.184 0.166

Test Set Performance

[
  {
    "dataset": "MATH500",
    "url": "https://huggingface.co/datasets/qq8933/MATH500",
    "accuracy": 0.286
  },
  {
    "dataset": "GSM8K",
    "url": "https://huggingface.co/datasets/openai/gsm8k",
    "accuracy": 0.382
  }
]