Update README.md
Browse files
README.md
CHANGED
@@ -119,7 +119,7 @@ model-index:
|
|
119 |
---
|
120 |
|
121 |
## Model Summary
|
122 |
-
PowerMoE-3B is a 3B sparse Mixture-of-Experts (sMoE) language model trained with the Power learning rate scheduler. It sparsely activates 800M parameters for each token. It is trained on a
|
123 |
Paper: https://arxiv.org/abs/2408.13359
|
124 |
|
125 |
## Usage
|
|
|
119 |
---
|
120 |
|
121 |
## Model Summary
|
122 |
+
PowerMoE-3B is a 3B sparse Mixture-of-Experts (sMoE) language model trained with the Power learning rate scheduler. It sparsely activates 800M parameters for each token. It is trained on a mix of open-source and proprietary datasets. PowerMoE-3B has shown promising results compared to other dense models with 2x activate parameters across various benchmarks, including natural language multi-choices, code generation, and math reasoning.
|
123 |
Paper: https://arxiv.org/abs/2408.13359
|
124 |
|
125 |
## Usage
|