Post
646
Meta has unveiled its Llama 4 🦙 family of models, featuring native multimodality and mixture-of-experts architecture. Two model families are available now:
Models🤗: meta-llama/llama-4-67f0c30d9fe03840bc9d0164
Blog Post: https://ai.meta.com/blog/llama-4-multimodal-intelligence/
HF's Blog Post: https://huggingface.co/blog/llama4-release
- 🧠 Native Multimodality - Process text and images in a unified architecture
- 🔍 Mixture-of-Experts - First Llama models using MoE for incredible efficiency
- 📏 Super Long Context - Up to 10M tokens
- 🌐 Multilingual Power - Trained on 200 languages with 10x more multilingual tokens than Llama 3 (including over 100 languages with over 1 billion tokens each)
🔹 Llama 4 Scout
- 17B active parameters (109B total)
- 16 experts architecture
- 10M context window
- Fits on a single H100 GPU
- Beats Gemma 3, Gemini 2.0 Flash-Lite, and Mistral 3.1
🔹 Llama 4 Maverick
- 17B active parameters (400B total)
- 128 experts architecture
- It can fit perfectly on DGX H100(8x H100)
- 1M context window
- Outperforms GPT-4o and Gemini 2.0 Flash
- ELO score of 1417 on LMArena currently second best model on arena
🔹 Llama 4 Behemoth (Coming Soon)
- 288B active parameters (2T total)
- 16 experts architecture
- Teacher model for Scout and Maverick
- Outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM benchmarks
Models🤗: meta-llama/llama-4-67f0c30d9fe03840bc9d0164
Blog Post: https://ai.meta.com/blog/llama-4-multimodal-intelligence/
HF's Blog Post: https://huggingface.co/blog/llama4-release
- 🧠 Native Multimodality - Process text and images in a unified architecture
- 🔍 Mixture-of-Experts - First Llama models using MoE for incredible efficiency
- 📏 Super Long Context - Up to 10M tokens
- 🌐 Multilingual Power - Trained on 200 languages with 10x more multilingual tokens than Llama 3 (including over 100 languages with over 1 billion tokens each)
🔹 Llama 4 Scout
- 17B active parameters (109B total)
- 16 experts architecture
- 10M context window
- Fits on a single H100 GPU
- Beats Gemma 3, Gemini 2.0 Flash-Lite, and Mistral 3.1
🔹 Llama 4 Maverick
- 17B active parameters (400B total)
- 128 experts architecture
- It can fit perfectly on DGX H100(8x H100)
- 1M context window
- Outperforms GPT-4o and Gemini 2.0 Flash
- ELO score of 1417 on LMArena currently second best model on arena
🔹 Llama 4 Behemoth (Coming Soon)
- 288B active parameters (2T total)
- 16 experts architecture
- Teacher model for Scout and Maverick
- Outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM benchmarks