Running on CPU Upgrade 220 The Synthetic Data Playbook: Generating Trillions of the Finest Tokens 📝 220 Explore synthetic data experiments on a virtual bookshelf
laion/CLIP-ViT-bigG-14-laion2B-39B-b160k Zero-Shot Image Classification • Updated Jan 22, 2025 • 70.8k • 310
Running on CPU Upgrade Featured 3.11k The Smol Training Playbook 📚 3.11k The secrets to building world-class LLMs
google/siglip2-large-patch16-512 Zero-Shot Image Classification • 0.9B • Updated Feb 21, 2025 • 12.3k • 20
google/siglip-large-patch16-384 Zero-Shot Image Classification • 0.7B • Updated Sep 26, 2024 • 64.1k • 11
nomic-ai/nomic-embed-vision-v1.5 Image Feature Extraction • 92.9M • Updated Mar 31, 2025 • 143k • 217
BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining Paper • 2508.10975 • Published Aug 14, 2025 • 60