moonshotai
/

Moonlight-16B-A3B-Instruct

Text Generation

Model card Files Files and versions Community

lsw825 commited on 3 days ago

Commit

4308c42

·

verified ·

1 Parent(s): d413755

Update README.md

Files changed (1) hide show

README.md +4 -1

README.md CHANGED Viewed

@@ -52,7 +52,9 @@ We compared Moonlight with SOTA public models at similar scale:
 - **LLAMA3-3B** is a 3B-parameter dense model trained with 9T tokens
 - **Qwen2.5-3B** is a 3B-parameter dense model trained with 18T tokens
 - **Deepseek-v2-Lite** is a 2.4B/16B-parameter MOE model trained with 5.7T tokens
- <div align="center">
 | | **Benchmark (Metric)** | **Llama3.2-3B** | **Qwen2.5-3B** | **DSV2-Lite** | **Moonlight** |
 |---|---|---|---|---|---|
 | | Activated Param† | 2.81B | 2.77B | 2.24B | 2.24B |
@@ -70,6 +72,7 @@ We compared Moonlight with SOTA public models at similar scale:
 | | CMath | - | 80.0 | 58.4 | **81.1** |
 | **Chinese** | C-Eval | - | 75.0 | 60.3 | **77.2** |
 | | CMMLU | - | 75.0 | 64.3 | **78.2** |
 </div>
 *Qwen 2 & 2.5 reports didn't disclose their optimizer information. †The reported parameter counts exclude the embedding parameters. ‡We test all listed models with the full set of TriviaQA.*

 - **LLAMA3-3B** is a 3B-parameter dense model trained with 9T tokens
 - **Qwen2.5-3B** is a 3B-parameter dense model trained with 18T tokens
 - **Deepseek-v2-Lite** is a 2.4B/16B-parameter MOE model trained with 5.7T tokens
+<div align="center">
 | | **Benchmark (Metric)** | **Llama3.2-3B** | **Qwen2.5-3B** | **DSV2-Lite** | **Moonlight** |
 |---|---|---|---|---|---|
 | | Activated Param† | 2.81B | 2.77B | 2.24B | 2.24B |
 | | CMath | - | 80.0 | 58.4 | **81.1** |
 | **Chinese** | C-Eval | - | 75.0 | 60.3 | **77.2** |
 | | CMMLU | - | 75.0 | 64.3 | **78.2** |
 </div>
 *Qwen 2 & 2.5 reports didn't disclose their optimizer information. †The reported parameter counts exclude the embedding parameters. ‡We test all listed models with the full set of TriviaQA.*