Cong Yu's picture
4 7

Cong Yu

Benyucong

AI & ML interests

Machine Learning Systems

Recent Activity

liked a dataset 26 days ago
linuzj/graph-data-quantum
liked a model about 1 month ago
Skywork/Skywork-R1V-38B
liked a model 6 months ago
meta-llama/Llama-3.1-8B-Instruct
View all activity

Organizations

University of Southern California's profile picture

Benyucong's activity

upvoted an article 10 months ago
view article
Article

Introduction to 3D Gaussian Splatting

61
reacted to akhaliq's post with ❤️ about 1 year ago
view post
Post
2313
Jamba

A Hybrid Transformer-Mamba Language Model

Jamba: A Hybrid Transformer-Mamba Language Model (2403.19887)

We present Jamba, a new base large language model based on a novel hybrid Transformer-Mamba mixture-of-experts (MoE) architecture. Specifically, Jamba interleaves blocks of Transformer and Mamba layers, enjoying the benefits of both model families. MoE is added in some of these layers to increase model capacity while keeping active parameter usage manageable. This flexible architecture allows resource- and objective-specific configurations. In the particular configuration we have implemented, we end up with a powerful model that fits in a single 80GB GPU. Built at large scale, Jamba provides high throughput and small memory footprint compared to vanilla Transformers, and at the same time state-of-the-art performance on standard language model benchmarks and long-context evaluations. Remarkably, the model presents strong results for up to 256K tokens context length. We study various architectural decisions, such as how to combine Transformer and Mamba layers, and how to mix experts, and show that some of them are crucial in large scale modeling. We also describe several interesting properties of these architectures which the training and evaluation of Jamba have revealed, and plan to release checkpoints from various ablation runs, to encourage further exploration of this novel architecture. We make the weights of our implementation of Jamba publicly available under a permissive license.