Llama models (arguably the most successful open AI models of all times) just represented 3% of total model downloads on Hugging Face in March.

People and media like stories of winner takes all & one model/company to rule them all but the reality is much more nuanced than this!

Kudos to all the small AI builders out there!

2 replies

New activity in dicta-il/dictabert-large-char-menaked 12 days ago

Different variations of diacritics

#3 opened 12 days ago by

thewh1teagle

updated a Space 12 days ago

Medieval Yolo

🐠

Analyze medieval manuscript images to detect and label objects

updated a model 12 days ago

johnlockejrr/medieval-manuscript-yolov11-seg

Object Detection • Updated 10 days ago

New activity in thewh1teagle/add-diacritics-in-hebrew 12 days ago

Cool for Modern Hebrew but not for Rabbinic or Biblical Hebrew

#1 opened 12 days ago by

johnlockejrr

reacted to fdaudens's post with 🔥 12 days ago

Post

2225

Did we just drop personalized AI evaluation?! This tool auto-generates custom benchmarks on your docs to test which models are the best.

Most benchmarks test general capabilities, but what matters is how models handle your data and tasks. YourBench helps answer critical questions like:
- Do you really need a hundreds-of-billions-parameter model sledgehammer to crack a nut?
- Could a smaller, fine-tuned model work better?
- How well do different models understand your domain?

Some cool features:
📚 Generates custom benchmarks from your own documents (PDFs, Word, HTML)
🎯 Tests models on real tasks, not just general capabilities
🔄 Supports multiple models for different pipeline stages
🧠 Generate both single-hop and multi-hop questions
🔍 Evaluate top models and deploy leaderboards instantly
💰 Full cost analysis to optimize for your budget
🛠️ Fully configurable via a single YAML file

26 SOTA models tested for question generation. Interesting finding: Qwen2.5 32B leads in question diversity, while smaller Qwen models and Gemini 2.0 Flash offer great value for cost.

You can also run it locally on any models you want.

I'm impressed. Try it out: yourbench/demo