lsw825 commited on
Commit
4308c42
·
verified ·
1 Parent(s): d413755

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -1
README.md CHANGED
@@ -52,7 +52,9 @@ We compared Moonlight with SOTA public models at similar scale:
52
  - **LLAMA3-3B** is a 3B-parameter dense model trained with 9T tokens
53
  - **Qwen2.5-3B** is a 3B-parameter dense model trained with 18T tokens
54
  - **Deepseek-v2-Lite** is a 2.4B/16B-parameter MOE model trained with 5.7T tokens
55
- <div align="center">
 
 
56
  | | **Benchmark (Metric)** | **Llama3.2-3B** | **Qwen2.5-3B** | **DSV2-Lite** | **Moonlight** |
57
  |---|---|---|---|---|---|
58
  | | Activated Param† | 2.81B | 2.77B | 2.24B | 2.24B |
@@ -70,6 +72,7 @@ We compared Moonlight with SOTA public models at similar scale:
70
  | | CMath | - | 80.0 | 58.4 | **81.1** |
71
  | **Chinese** | C-Eval | - | 75.0 | 60.3 | **77.2** |
72
  | | CMMLU | - | 75.0 | 64.3 | **78.2** |
 
73
  </div>
74
 
75
  *Qwen 2 & 2.5 reports didn't disclose their optimizer information. †The reported parameter counts exclude the embedding parameters. ‡We test all listed models with the full set of TriviaQA.*
 
52
  - **LLAMA3-3B** is a 3B-parameter dense model trained with 9T tokens
53
  - **Qwen2.5-3B** is a 3B-parameter dense model trained with 18T tokens
54
  - **Deepseek-v2-Lite** is a 2.4B/16B-parameter MOE model trained with 5.7T tokens
55
+
56
+ <div align="center">
57
+
58
  | | **Benchmark (Metric)** | **Llama3.2-3B** | **Qwen2.5-3B** | **DSV2-Lite** | **Moonlight** |
59
  |---|---|---|---|---|---|
60
  | | Activated Param† | 2.81B | 2.77B | 2.24B | 2.24B |
 
72
  | | CMath | - | 80.0 | 58.4 | **81.1** |
73
  | **Chinese** | C-Eval | - | 75.0 | 60.3 | **77.2** |
74
  | | CMMLU | - | 75.0 | 64.3 | **78.2** |
75
+
76
  </div>
77
 
78
  *Qwen 2 & 2.5 reports didn't disclose their optimizer information. †The reported parameter counts exclude the embedding parameters. ‡We test all listed models with the full set of TriviaQA.*