ibm
/

PowerMoE-3b

rpand002 commited on Aug 28, 2024

Commit

c5172f5

verified ·

1 Parent(s): dfb7ef8

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -119,7 +119,7 @@ model-index:
 ---
 ## Model Summary
-PowerMoE-3B is a 3B sparse Mixture-of-Experts (sMoE) language model trained with the Power learning rate scheduler. It sparsely activates 800M parameters for each token. It is trained on a wide range of open-source and synthetic datasets with permissive licenses. PowerMoE-3B has shown promising results compared to other dense models with 2x activate parameters across various benchmarks, including natural language multi-choices, code generation, and math reasoning.
 Paper: https://arxiv.org/abs/2408.13359
 ## Usage

 ---
 ## Model Summary
+PowerMoE-3B is a 3B sparse Mixture-of-Experts (sMoE) language model trained with the Power learning rate scheduler. It sparsely activates 800M parameters for each token. It is trained on a mix of open-source and proprietary datasets. PowerMoE-3B has shown promising results compared to other dense models with 2x activate parameters across various benchmarks, including natural language multi-choices, code generation, and math reasoning.
 Paper: https://arxiv.org/abs/2408.13359
 ## Usage