Meta has released Llama 4 Scout and Llama 4 Maverick, now available on Hugging Face: • Llama 4 Scout: 17B active parameters, 16-expert Mixture of Experts (MoE) architecture, 10M token context window, fits on a single H100 GPU.  • Llama 4 Maverick: 17B active parameters, 128-expert MoE architecture, 1M token context window, optimized for DGX H100 systems. 
🔥 Key Features: • Native Multimodality: Seamlessly processes text and images.  • Extended Context Window: Up to 10 million tokens for handling extensive inputs. • Multilingual Support: Trained on 200 languages, with fine-tuning support for 12, including Arabic, Spanish, and German. 
🛠️ Access and Integration: • Model Checkpoints: Available under the meta-llama organization on the Hugging Face Hub. • Transformers Compatibility: Fully supported in transformers v4.51.0 for easy loading and fine-tuning. • Efficient Deployment: Supports tensor-parallelism and automatic device mapping.
These models offer developers enhanced capabilities for building sophisticated, multimodal AI applications. 
We find that OlympicCoder models outperform Claude 3.7 Sonnet, as well as others over 100x larger 💪
Together with the models, we are releasing:
📊CodeForces-CoTs: new dataset of code problems from the most popular competitive coding platform, with R1 traces in C++ and Python open-r1/codeforces-cots
🏆 IOI'2024: a new benchmark of VERY hard programming problems where even frontier models struggle to match human performance open-r1/ioi