Update README.md
Browse files
README.md
CHANGED
@@ -52,7 +52,9 @@ We compared Moonlight with SOTA public models at similar scale:
|
|
52 |
- **LLAMA3-3B** is a 3B-parameter dense model trained with 9T tokens
|
53 |
- **Qwen2.5-3B** is a 3B-parameter dense model trained with 18T tokens
|
54 |
- **Deepseek-v2-Lite** is a 2.4B/16B-parameter MOE model trained with 5.7T tokens
|
55 |
-
|
|
|
|
|
56 |
| | **Benchmark (Metric)** | **Llama3.2-3B** | **Qwen2.5-3B** | **DSV2-Lite** | **Moonlight** |
|
57 |
|---|---|---|---|---|---|
|
58 |
| | Activated Param† | 2.81B | 2.77B | 2.24B | 2.24B |
|
@@ -70,6 +72,7 @@ We compared Moonlight with SOTA public models at similar scale:
|
|
70 |
| | CMath | - | 80.0 | 58.4 | **81.1** |
|
71 |
| **Chinese** | C-Eval | - | 75.0 | 60.3 | **77.2** |
|
72 |
| | CMMLU | - | 75.0 | 64.3 | **78.2** |
|
|
|
73 |
</div>
|
74 |
|
75 |
*Qwen 2 & 2.5 reports didn't disclose their optimizer information. †The reported parameter counts exclude the embedding parameters. ‡We test all listed models with the full set of TriviaQA.*
|
|
|
52 |
- **LLAMA3-3B** is a 3B-parameter dense model trained with 9T tokens
|
53 |
- **Qwen2.5-3B** is a 3B-parameter dense model trained with 18T tokens
|
54 |
- **Deepseek-v2-Lite** is a 2.4B/16B-parameter MOE model trained with 5.7T tokens
|
55 |
+
|
56 |
+
<div align="center">
|
57 |
+
|
58 |
| | **Benchmark (Metric)** | **Llama3.2-3B** | **Qwen2.5-3B** | **DSV2-Lite** | **Moonlight** |
|
59 |
|---|---|---|---|---|---|
|
60 |
| | Activated Param† | 2.81B | 2.77B | 2.24B | 2.24B |
|
|
|
72 |
| | CMath | - | 80.0 | 58.4 | **81.1** |
|
73 |
| **Chinese** | C-Eval | - | 75.0 | 60.3 | **77.2** |
|
74 |
| | CMMLU | - | 75.0 | 64.3 | **78.2** |
|
75 |
+
|
76 |
</div>
|
77 |
|
78 |
*Qwen 2 & 2.5 reports didn't disclose their optimizer information. †The reported parameter counts exclude the embedding parameters. ‡We test all listed models with the full set of TriviaQA.*
|