-
Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time
Paper • 2408.13233 • Published • 20 -
Heterogeneous Multi-task Learning with Expert Diversity
Paper • 2106.10595 • Published • 1 -
Residual Mixture of Experts
Paper • 2204.09636 • Published • 1 -
Language-Routing Mixture of Experts for Multilingual and Code-Switching Speech Recognition
Paper • 2307.05956 • Published • 1
Hazem Essam
hazemessam
AI & ML interests
Protein Language Modeling, Natural Language Processing, Generative Adverserial Networks.
Organizations
Collections
1
models
None public yet