CCMat
's Collections
BlackMamba: Mixture of Experts for State-Space Models
Paper
•
2402.01771
•
Published
•
23
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper
•
2402.01739
•
Published
•
26
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper
•
2401.15947
•
Published
•
49
DeepSeekMoE: Towards Ultimate Expert Specialization in
Mixture-of-Experts Language Models
Paper
•
2401.06066
•
Published
•
44
MoE-Mamba: Efficient Selective State Space Models with Mixture of
Experts
Paper
•
2401.04081
•
Published
•
70
Paper
•
2401.04088
•
Published
•
158
Scaling Laws for Fine-Grained Mixture of Experts
Paper
•
2402.07871
•
Published
•
11
Mixtures of Experts Unlock Parameter Scaling for Deep RL
Paper
•
2402.08609
•
Published
•
34
Multi-Head Mixture-of-Experts
Paper
•
2404.15045
•
Published
•
59
Mixture-of-Depths: Dynamically allocating compute in transformer-based
language models
Paper
•
2404.02258
•
Published
•
104