🏟️ Smol AI WorldCup: A 5-Axis Benchmark That Reveals What Small Language Models Can Really Do 3 days ago • 37
Structural Problems in AI Benchmarking and the Case for a Unified Evaluation Framework 5 days ago • 12
FINAL Bench World's First Functional Metacognition Benchmark. "Not how much AI knows — but whether it knows what it doesn't know, and can fix it." FINAL-Bench/Metacognitive Viewer • Updated 14 days ago • 100 • 10.6k • 72 Running Featured 39 Leaderboard - FINAL Bench 'Metacognitive' 🚀 39 Metacognitive
FINAL Bench World's First Functional Metacognition Benchmark. "Not how much AI knows — but whether it knows what it doesn't know, and can fix it." FINAL-Bench/Metacognitive Viewer • Updated 14 days ago • 100 • 10.6k • 72 Running Featured 39 Leaderboard - FINAL Bench 'Metacognitive' 🚀 39 Metacognitive