Mert Erbak's picture

Mert Erbak PRO

merterbak

AI & ML interests

NLP and Image Processing

Recent Activity

updated a Space about 5 hours ago
merterbak/grok
updated a Space about 5 hours ago
merterbak/gemma-3
updated a Space about 5 hours ago
merterbak/DeepSeek-R1-Distill-Qwen-1.5B
View all activity

Organizations

Open-Source AI Meetup's profile picture MLX Community's profile picture Social Post Explorers's profile picture Hugging Face Discord Community's profile picture open/ acc's profile picture AI Starter Pack's profile picture

Posts 4

view post
Post
2852
Meta has unveiled its Llama 4 πŸ¦™ family of models, featuring native multimodality and mixture-of-experts architecture. Two model families are available now:
ModelsπŸ€—: meta-llama/llama-4-67f0c30d9fe03840bc9d0164
Blog Post: https://ai.meta.com/blog/llama-4-multimodal-intelligence/
HF's Blog Post: https://huggingface.co/blog/llama4-release

- 🧠 Native Multimodality - Process text and images in a unified architecture
- πŸ” Mixture-of-Experts - First Llama models using MoE for incredible efficiency
- πŸ“ Super Long Context - Up to 10M tokens
- 🌐 Multilingual Power - Trained on 200 languages with 10x more multilingual tokens than Llama 3 (including over 100 languages with over 1 billion tokens each)

πŸ”Ή Llama 4 Scout
- 17B active parameters (109B total)
- 16 experts architecture
- 10M context window
- Fits on a single H100 GPU
- Beats Gemma 3, Gemini 2.0 Flash-Lite, and Mistral 3.1

πŸ”Ή Llama 4 Maverick
- 17B active parameters (400B total)
- 128 experts architecture
- It can fit perfectly on DGX H100(8x H100)
- 1M context window
- Outperforms GPT-4o and Gemini 2.0 Flash
- ELO score of 1417 on LMArena currently second best model on arena

πŸ”Ή Llama 4 Behemoth (Coming Soon)
- 288B active parameters (2T total)
- 16 experts architecture
- Teacher model for Scout and Maverick
- Outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM benchmarks