Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

wolfram 
posted an update 2 days ago
view post
Post
4288
Finally finished my extensive **Qwen 3 evaluations** across a range of formats and quantisations, focusing on **MMLU-Pro** (Computer Science).

A few take-aways stood out - especially for those interested in local deployment and performance trade-offs:

1️⃣ **Qwen3-235B-A22B** (via Fireworks API) tops the table at **83.66%** with ~55 tok/s.
2️⃣ But the **30B-A3B Unsloth** quant delivered **82.20%** while running locally at ~45 tok/s and with zero API spend.
3️⃣ The same Unsloth build is ~5x faster than Qwen's **Qwen3-32B**, which scores **82.20%** as well yet crawls at <10 tok/s.
4️⃣ On Apple silicon, the **30B MLX** port hits **79.51%** while sustaining ~64 tok/s - arguably today's best speed/quality trade-off for Mac setups.
5️⃣ The **0.6B** micro-model races above 180 tok/s but tops out at **37.56%** - that's why it's not even on the graph (50 % performance cut-off).

All local runs were done with LM Studio on an M4 MacBook Pro, using Qwen's official recommended settings.

**Conclusion:** Quantised 30B models now get you ~98 % of frontier-class accuracy - at a fraction of the latency, cost, and energy. For most local RAG or agent workloads, they're not just good enough - they're the new default.

Well done, Qwen - you really whipped the llama's ass! And to OpenAI: for your upcoming open model, please make it MoE, with toggleable reasoning, and release it in many sizes. *This* is the future!
·
DawnC 
posted an update 1 day ago
view post
Post
3056
VisionScout — Now with Video Analysis! 🚀

I’m excited to announce a major update to VisionScout, my interactive vision tool that now supports VIDEO PROCESSING, in addition to powerful object detection and scene understanding!

⭐️ NEW: Video Analysis Is Here!
🎬 Upload any video file to detect and track objects using YOLOv8.
⏱️ Customize processing intervals to balance speed and thoroughness.
📊 Get comprehensive statistics and summaries showing object appearances across the entire video.

What else can VisionScout do?

🖼️ Analyze any image and detect 80 object types with YOLOv8.
🔄 Switch between Nano, Medium, and XLarge models for speed or accuracy.
🎯 Filter by object classes (people, vehicles, animals, etc.) to focus on what matters.
📊 View detailed stats on detections, confidence levels, and distributions.
🧠 Understand scenes — interpreting environments and potential activities.
⚠️ Automatically identify possible safety concerns based on detected objects.

What’s coming next?
🔎 Expanding YOLO’s object categories.
⚡ Faster real-time performance.
📱 Improved mobile responsiveness.

My goal:
To bridge the gap between raw detection and meaningful interpretation.
I’m constantly exploring ways to help machines not just "see" but truly understand context — and to make these advanced tools accessible to everyone, regardless of technical background.

Try it now! 🖼️👉 DawnC/VisionScout

If you enjoy VisionScout, a ❤️ Like for this project or feedback would mean a lot and keeps me motivated to keep building and improving!

#ComputerVision #ObjectDetection #VideoAnalysis #YOLO #SceneUnderstanding #MachineLearning #TechForLife
  • 2 replies
·
giadap 
posted an update 2 days ago
view post
Post
3483
Ever notice how some AI assistants feel like tools while others feel like companions? Turns out, it's not always about fancy tech upgrades, because sometimes it's just clever design.

Our latest blog post at Hugging Face dives into how minimal design choices can completely transform how users experience AI. We've seen our community turn the same base models into everything from swimming coaches to interview prep specialists with surprisingly small tweaks.

The most fascinating part? When we tested identical models with different "personalities" in our Inference Playground, the results were mind-blowing.

Want to experiment yourself? Our Inference Playground lets anyone (yes, even non-coders!) test these differences in real-time. You can:

- Compare multiple models side-by-side
- Customize system prompts
- Adjust parameters like temperature
- Test multi-turn conversations

It's fascinating how a few lines of instruction text can transform the same AI from strictly professional to seemingly caring and personal, without changing a single line of code in the model itself.

Read more here: https://huggingface.co/blog/giadap/ai-personas
clem 
posted an update about 20 hours ago
RiverZ 
posted an update 3 days ago
view post
Post
5484
🔥 We're thrilled to share some exciting news about ICEdit! Currently, ICEdit app ( RiverZ/ICEdit) has soared to the second place on the weekly trend list of Hugging Face Space, just trailing behind Qwen3. What's more, it also holds the second position on the overall space trend list. This achievement wouldn't have been possible without your incredible support and love. A huge thank you to each and every one of you❤!

🎉 The ICEdit community has been incredibly active, and we've seen a plethora of amazing ComfyUI workflows being shared. For instance, with the help of ComfyUI - nunchaku, you can run ICEdit locally with just 4GB of VRAM. This makes it much more accessible for those with limited hardware resources.

🎇 If you're interested in the detailed information, please head over to our repository. We highly encourage you to give these workflows a try and explore the creative possibilities that ICEdit offers.

Github Repo: https://github.com/River-Zhang/ICEdit
Hugging Face Space: RiverZ/ICEdit
VirtualOasis 
posted an update 3 days ago
view post
Post
2900
Agents vs. Workflows
Agents are systems where LLMs dynamically direct their processes and tool usage, maintaining control over how they accomplish tasks.
Workflows are through predefined code paths, ensuring that each step is executed in a deterministic manner.

Agents are like smart assistants that can think on their own. They understand situations, make decisions, and act, whatever the task is new or unpredictable. Think of the Agent as a chef who can make a meal based on what they have.

Workflows are like a recipe with fixed steps. They’re a series of tasks done in order, like following a checklist for approving a loan. They’re great for tasks that don’t change much.
nomadicsynth 
posted an update 2 days ago
view post
Post
1855
I Did a Thing!

I made an embedding model to find answers in research papers. It goes deeper than plain "semantic search" by identifying deeply reasoned connections and interdisciplinary insights that might have been overlooked. The goal is to find the solutions that might have been missed and to uncover answers that are already out there.

I’ve set up a demo Space - nomadicsynth/inkling . It’s early days, and I’d love some feedback on the model’s results. Try it out and let me know what you think!

Oh, and if it finds your Nobel-winning answer, I want a cut! 😉
·
sequelbox 
posted an update 3 days ago
view post
Post
2426
NEW RELEASE: Esper 3 for Qwen 3!

- A full-stack software assistant: a reasoning finetune focused on coding, architecture, and DevOps using the Titanium and Tachibana datasets!
- Improved general and creative reasoning skills, powered by the Raiden dataset.

4B model: ValiantLabs/Qwen3-4B-Esper3
8B model: ValiantLabs/Qwen3-8B-Esper3

We'll also be bringing Esper 3 to larger Qwen 3 models as soon as we can - if you want these, consider helping us out: sequelbox/SupportOpenSource

More models and datasets to come soon!

with my love and enthusiasm,
allegra
  • 1 reply
·
merve 
posted an update 3 days ago
view post
Post
4558
A ton of impactful models and datasets in open AI past week, let's summarize the best 🤩 merve/releases-apr-21-and-may-2-6819dcc84da4190620f448a3

💬 Qwen made it rain! They released Qwen3: new dense and MoE models ranging from 0.6B to 235B 🤯 as well as Qwen2.5-Omni, any-to-any model in 3B and 7B!
> Microsoft AI released Phi4 reasoning models (that also come in mini and plus sizes)
> NVIDIA released new CoT reasoning datasets
🖼️ > ByteDance released UI-TARS-1.5, native multimodal UI parsing agentic model
> Meta released EdgeTAM, an on-device object tracking model (SAM2 variant)
🗣️ NVIDIA released parakeet-tdt-0.6b-v2, a smol 600M automatic speech recognition model
> Nari released Dia, a 1.6B text-to-speech model
> Moonshot AI released Kimi Audio, a new audio understanding, generation, conversation model
👩🏻‍💻 JetBrains released Melium models in base and SFT for coding
> Tesslate released UIGEN-T2-7B, a new text-to-frontend-code model 🤩
juhoinkinen 
posted an update 1 day ago
view post
Post
1397
We ( @osma , @MonaLehtinen & me, i.e. the Annif team at the National Library of Finland) recently took part in the LLMs4Subjects challenge at the SemEval-2025 workshop. The task was to use large language models (LLMs) to generate good quality subject indexing for bibliographic records, i.e. titles and abstracts.

We are glad to report that our system performed well; it was ranked

🥇 1st in the category where the full vocabulary was used
🥈 2nd in the smaller vocabulary category
🏅 4th in the qualitative evaluations.

14 participating teams developed their own solutions for generating subject headings and the output of each system was assessed using both quantitative and qualitative evaluations. Research papers about most of the systems are going to be published around the time of the workshop in late July, and many pre-prints are already available.

We applied Annif together with several LLMs that we used to preprocess the data sets: translated the GND vocabulary terms to English, translated bibliographic records into English and German as required, and generated additional synthetic training data. After the preprocessing, we used the traditional machine learning algorithms in Annif as well as the experimental XTransformer algorithm that is based on language models. We also combined the subject suggestions generated using English and German language records in a novel way.

More information can be found in our system description preprint: Annif at SemEval-2025 Task 5: Traditional XMTC augmented by LLMs (2504.19675)

See also the task description preprint: SemEval-2025 Task 5: LLMs4Subjects -- LLM-based Automated Subject Tagging for a National Technical Library's Open-Access Catalog (2504.07199)

The Annif models trained for this task are available here: NatLibFi/Annif-LLMs4Subjects-data
  • 2 replies
·