Wow, this is amazing! π€― Samba is a powerful hybrid model with an unlimited context length, combining Mamba, MLP, Sliding Window Attention, and MLP stacking. Samba largest version, Samba-3.8B, trained on 3.2 trillion tokens, excels in benchmarks like MMLU, GSM8K, and HumanEval, and shines in long-context tasks with minimal tuning. --- Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling" Github: https://github.com/microsoft/Samba
reacted to ordagan's
post with π₯π12 months ago
We are thrilled to announce Jamba, the worldβs first production-grade Mamba based model.
Key Features: - First production-grade Mamba based model built on a novel SSM-Transformer hybrid architecture - 3X throughput on long contexts compared to Mixtral 8x7B - Democratizes access to a massive 256K context window - The only model in its size class that fits up to 140K context on a single GPU
Jamba is based on a novel architecture that combines Mamba and Transformer. While our initial results show great efficiency gains, we expect this to be further explored and improved with the help of the community.