Did we just drop personalized AI evaluation?! This tool auto-generates custom benchmarks on your docs to test which models are the best.
Most benchmarks test general capabilities, but what matters is how models handle your data and tasks. YourBench helps answer critical questions like: - Do you really need a hundreds-of-billions-parameter model sledgehammer to crack a nut? - Could a smaller, fine-tuned model work better? - How well do different models understand your domain?
Some cool features: 📚 Generates custom benchmarks from your own documents (PDFs, Word, HTML) 🎯 Tests models on real tasks, not just general capabilities 🔄 Supports multiple models for different pipeline stages 🧠 Generate both single-hop and multi-hop questions 🔍 Evaluate top models and deploy leaderboards instantly 💰 Full cost analysis to optimize for your budget 🛠️ Fully configurable via a single YAML file
26 SOTA models tested for question generation. Interesting finding: Qwen2.5 32B leads in question diversity, while smaller Qwen models and Gemini 2.0 Flash offer great value for cost.
You can also run it locally on any models you want.
The new DeepSite space is really insane for vibe-coders enzostvs/deepsite
With the wave of vibe-coding-optimized LLMs like the latest open-source DeepSeek model (version V3-0324), you can basically prompt out-of-the-box and create any app and game in one-shot.
It feels so powerful to me, no more complex framework or under-the-hood prompt engineering to have a working text-to-app tool.
AI is eating the world and *open-source* AI is eating AI itself!
PS: and even more meta is that the DeepSite app and DeepSeek model are both fully open-source code => time to start recursively improve?
PPS: you still need some inference hosting unless you're running the 600B param model at home, so check the very nice list of HF Inference Providers for this model: deepseek-ai/DeepSeek-V3-0324
Want to vibecode with DeepSeek? Just spent 10 minutes with this space and created a full world indicators dashboard - literally just by describing what I wanted!
Anyone can now prototype and deploy projects instantly.
Want to ramp up your AI skills and start breaking bigger stories? With the Journalists on Hugging Face community, we're launching our first learn-together course!
We'll build AI classifiers that process months of data in minutes. How?
- Work through an interactive version of an excellent course developed by Ben Welsh and Derek Willis - Share findings and get help in our dedicated community channel - Build working classifiers you can use in your reporting today
No coding background needed - if you can write a ChatGPT or Claude prompt, you can do this. Journalists are already using these techniques to break stories, from uncovering hidden real estate deals to tracking unusual campaign spending.
Join us—it might give you your next big story!
Thanks to Ben and Derek for letting me adapt their excellent course into this interactive version!
👀 Multimodal > Mistral AI released a 24B vision LM, both base and instruction FT versions, sota 🔥 (OS) > with IBM we released SmolDocling, a sota 256M document parser with Apache 2.0 license (OS) > SpatialLM is a new vision LM that outputs 3D bounding boxes, comes with 0.5B (QwenVL based) and 1B (Llama based) variants > SkyWork released SkyWork-R1V-38B, new vision reasoning model (OS)
💬 LLMs > NVIDIA released new Nemotron models in 49B and 8B with their post-training dataset > LG released EXAONE, new reasoning models in 2.4B, 7.8B and 32B > Dataset: Glaive AI released a new reasoning dataset of 22M+ examples > Dataset: NVIDIA released new helpfulness dataset HelpSteer3 > Dataset: OpenManusRL is a new agent dataset based on ReAct framework (OS) > Open-R1 team released OlympicCoder, new competitive coder model in 7B and 32B > Dataset: GeneralThought-430K is a new reasoning dataset (OS)
🖼️ Image Generation/Computer Vision > Roboflow released RF-DETR, new real-time sota object detector (OS) 🔥 > YOLOE is a new real-time zero-shot object detector with text and visual prompts 🥹 > Stability AI released Stable Virtual Camera, a new novel view synthesis model > Tencent released Hunyuan3D-2mini, new small and fast 3D asset generation model > ByteDance released InfiniteYou, new realistic photo generation model > StarVector is a new 8B model that generates svg from images > FlexWorld is a new model that expands 3D views (OS)
🎤 Audio > Sesame released CSM-1B new speech generation model (OS)
🤖 Robotics > NVIDIA released GR00T, new robotics model for generalized reasoning and skills, along with the dataset
🎥 Just tested Stability AI's Stable Virtual Camera - it turns a single photo into dynamic video with AI-powered camera movements! From static meeting room to cinematic sweeps. 🚀
Want to build useful newsroom tools with AI? We’re launching a Hugging Face x Journalism Slack channel where journalists turn AI concepts into real newsroom solutions.
Inside the community: ✅ Build open-source AI tools for journalism ✅ Get direct help from the community ✅ Stay updated on new models and datasets ✅ Learn from other journalists’ experiments and builds
The goal? Go from “I read about AI” to “I built an AI tool that supercharged my newsroom.” —no more learning in isolation.