rpand002 commited on
Commit
c5172f5
·
verified ·
1 Parent(s): dfb7ef8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -119,7 +119,7 @@ model-index:
119
  ---
120
 
121
  ## Model Summary
122
- PowerMoE-3B is a 3B sparse Mixture-of-Experts (sMoE) language model trained with the Power learning rate scheduler. It sparsely activates 800M parameters for each token. It is trained on a wide range of open-source and synthetic datasets with permissive licenses. PowerMoE-3B has shown promising results compared to other dense models with 2x activate parameters across various benchmarks, including natural language multi-choices, code generation, and math reasoning.
123
  Paper: https://arxiv.org/abs/2408.13359
124
 
125
  ## Usage
 
119
  ---
120
 
121
  ## Model Summary
122
+ PowerMoE-3B is a 3B sparse Mixture-of-Experts (sMoE) language model trained with the Power learning rate scheduler. It sparsely activates 800M parameters for each token. It is trained on a mix of open-source and proprietary datasets. PowerMoE-3B has shown promising results compared to other dense models with 2x activate parameters across various benchmarks, including natural language multi-choices, code generation, and math reasoning.
123
  Paper: https://arxiv.org/abs/2408.13359
124
 
125
  ## Usage