clem (Clem 🤗)

reacted to danielhanchen's post with 👀🔥 3 months ago

Post

2804

A new way to use Unsloth.

Coming soon...

reacted to ronantakizawa's post with 🔥 7 months ago

Post

2547

Introducing the japanese-trending-words dataset: a dataset consisting 593 words from Japan’s annual trending word rankings (流行語大賞) from 2006-2025. This dataset provides the top 30 words from each year and its meaning in Japanese and english. This resource is awesome for NLP tasks understanding recent Japanese culture and history.

ronantakizawa/japanese-trending-words

#japanese #japanesedataset #trending

reacted to sergiopaniego's post with 🧠 7 months ago

Post

4042

you gotta go fast and go read the latest blog by @ror et al. explaining Continuous Batching in depth

https://huggingface.co/blog/continuous_batching

reacted to unmodeled-tyler's post with ❤️🚀 7 months ago

Post

712

New Datasets Published:
vanta-research/poetic-imagery-small
vanta-research/excitement-small

We are open sourcing two of our datasets today, which were used in the training of Apollo Astralis 8B and 4B.

The first dataset, poetic-imagery-small is designed to give the model's responses a bit of "depth" to them in order to encourage curiosity and thought from the user.

Additionally, the excitement-small dataset is designed to teach the model how to use "excited" language conversationally. This dataset was used on both Apollo Astralis models, which effectively demonstrate general excitement during user interaction.

VANTA Research is an AI safety project which aims to research and develop language models aligned for all types of thinking. These datasets were created aligned with that mission, in addition to rigorous AI safety standards.

reacted to nouamanetazi's post with ❤️🚀👍🤗 7 months ago

Post

4932

After training 𝐒𝐦𝐨𝐥𝐋𝐌𝟑 on 𝟑𝟖𝟒 𝐇𝟏𝟎𝟎𝐬 for nearly a month, I've come to realize something most people overlook: 𝐢𝐧𝐟𝐫𝐚𝐬𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞 𝐢𝐬 𝐭𝐡𝐞 𝐦𝐚𝐤𝐞-𝐨𝐫-𝐛𝐫𝐞𝐚𝐤 𝐟𝐚𝐜𝐭𝐨𝐫 𝐢𝐧 𝐋𝐋𝐌 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠. 🔥

Everyone talks about model architecture and data quality. And yes, those matter immensely. But here's what nobody tells you: when your training run fails at 2 AM because of mysterious 𝐍𝐂𝐂𝐋 𝐞𝐫𝐫𝐨𝐫𝐬, or when your expensive GPU cluster is running at 𝟔𝟎% 𝐞𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐜𝐲, the problem isn't your model. It's most probably a 𝐦𝐢𝐬𝐮𝐬𝐞 𝐨𝐟 𝐭𝐡𝐞 𝐡𝐚𝐫𝐝𝐰𝐚𝐫𝐞. 🛠️

Questions that seemed simple but had no clear answers: Why is 𝐌𝐨𝐄 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐬𝐥𝐨𝐰𝐞𝐫 𝐭𝐡𝐚𝐧 𝐝𝐞𝐧𝐬𝐞 𝐦𝐨𝐝𝐞𝐥𝐬? Which 𝐍𝐂𝐂𝐋 𝐟𝐥𝐚𝐠𝐬 should we actually set? How often should we checkpoint without killing throughput?

That's why we built 𝐓𝐡𝐞 𝐒𝐦𝐨𝐥 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐏𝐥𝐚𝐲𝐛𝐨𝐨𝐤 📖: a complete guide covering everything from model architecture and data curation to the SmolLM3 training marathon, post-training techniques, and crucially, the 𝐢𝐧𝐟𝐫𝐚𝐬𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞 𝐥𝐚𝐲𝐞𝐫 that most teams get wrong.

We validated real vs theoretical bandwidth across the entire stack: 𝐇𝐁𝐌𝟑 𝐡𝐢𝐭𝐭𝐢𝐧𝐠 𝟑 𝐓𝐁/𝐬, 𝐍𝐕𝐋𝐢𝐧𝐤 𝟒.𝟎 𝐫𝐞𝐚𝐜𝐡𝐢𝐧𝐠 𝟕𝟖𝟔 𝐆𝐁/𝐬, 𝐏𝐂𝐈𝐞 𝐆𝐞𝐧𝟒 𝐚𝐭 𝟏𝟒.𝟐 𝐆𝐁/𝐬. Then we ran collective operations across 𝟏𝟐𝟖 𝐆𝐏𝐔𝐬 (16 nodes, 8xH100s each) and measured how performance degrades at scale: all-reduce drops from 𝟒𝟖𝟎 𝐆𝐁/𝐬 on a single node to 𝟑𝟐𝟎-𝟑𝟓𝟎 𝐆𝐁/𝐬 across 16 nodes.

If you've ever wondered why your training runs are slower than they should be, or you're planning to scale up and want to avoid expensive mistakes, this guide might save you weeks of debugging.

𝐓𝐡𝐞 𝐒𝐦𝐨𝐥 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐏𝐥𝐚𝐲𝐛𝐨𝐨𝐤: https://lnkd.in/e5MKXUHS

Shared with ❤️ by the HuggingFace team

reacted to abidlabs's post with 🔥 7 months ago

Post

11426

Why I think local, open-source models will eventually win.

The most useful AI applications are moving toward multi-turn agentic behavior: systems that take hundreds or even thousands of iterative steps to complete a task, e.g. Claude Code, computer-control agents that click, type, and test repeatedly.

In these cases, the power of the model is not how smart it is per token, but in how quickly it can interact with its environment and tools across many steps. In that regime, model quality becomes secondary to latency.

An open-source model that can call tools quickly, check that the right thing was clicked, or verify that a code change actually passes tests can easily outperform a slightly “smarter” closed model that has to make remote API calls for every move.

Eventually, the balance tips: it becomes impractical for an agent to rely on remote inference for every micro-action. Just as no one would tolerate a keyboard that required a network request per keystroke, users won’t accept agent workflows bottlenecked by latency. All devices will ship with local, open-source models that are “good enough” and the expectation will shift toward everything running locally. It’ll happen sooner than most people think.

8 replies

·

reacted to flozi00's post with ❤️ 7 months ago

Post

3209

Some weeks ago, i've just decide its time to leave LinkedIn for me.
It got silent around my open source activities the last year, so i thought something has to change.

That's why my focus will move to share experiences and insights about hardware, drivers, kernels and linux. I won't post about how to use models, built agents or do prompting. I want to share about some deeper layers the actual hypes are built on.

I will start posting summarizations of my articles here on the hub.

English version:
https://flozi.net/en

German translated version:
https://flozi.net/de

Feel free to reach me if you want to read something specific.

2 replies

·

reacted to DheemanthReddy's post with ❤️ 7 months ago

Post

1252

We just released Maya-1-Voice, an open source voice AI model with voice design and emotions.

Describe voices in natural language. Add 20+ emotions like <laugh>, <cry>, <whisper> inline. 3B parameters, production-ready, runs on single GPU with vLLM.

Apache 2.0. Built on Llama backbone, predicts SNAC codec tokens for real-time streaming.

Model: https://huggingface.co/maya-research/maya-1-voice

reacted to abidlabs's post with 👍 7 months ago

Post

11426

Why I think local, open-source models will eventually win.

The most useful AI applications are moving toward multi-turn agentic behavior: systems that take hundreds or even thousands of iterative steps to complete a task, e.g. Claude Code, computer-control agents that click, type, and test repeatedly.

In these cases, the power of the model is not how smart it is per token, but in how quickly it can interact with its environment and tools across many steps. In that regime, model quality becomes secondary to latency.

An open-source model that can call tools quickly, check that the right thing was clicked, or verify that a code change actually passes tests can easily outperform a slightly “smarter” closed model that has to make remote API calls for every move.

Eventually, the balance tips: it becomes impractical for an agent to rely on remote inference for every micro-action. Just as no one would tolerate a keyboard that required a network request per keystroke, users won’t accept agent workflows bottlenecked by latency. All devices will ship with local, open-source models that are “good enough” and the expectation will shift toward everything running locally. It’ll happen sooner than most people think.

8 replies

·

reacted to Kseniase's post with 🚀❤️👍 9 months ago

Post

6255

10 awesome advanced LoRA approaches

Low-Rank Adaptation (LoRA) is the go-to method for efficient model fine-tuning that adds small low-rank matrices instead of retraining full models. The field isn’t standing still – new LoRA variants push the limits of efficiency, generalization, and personalization. So we’re sharing 10 of the latest LoRA approaches you should know about:

1. Mixture-of-LoRA-experts → Mixture of Low-Rank Adapter Experts in Generalizable Audio Deepfake Detection (2509.13878)
Adds multiple low-rank adapters (LoRA) into a model’s layers, and a routing mechanism activates the most suitable ones for each input. This lets the model adapt better to new unseen conditions

2. Amortized Bayesian Meta-Learning for LoRA (ABMLL) → Amortized Bayesian Meta-Learning for Low-Rank Adaptation of Large Language Models (2508.14285)
Balances global and task-specific parameters within a Bayesian framework to improve uncertainty calibration and generalization to new tasks without high memory or compute costs

3. AutoLoRA → AutoLoRA: Automatic LoRA Retrieval and Fine-Grained Gated Fusion for Text-to-Image Generation (2508.02107)
Automatically retrieves and dynamically aggregates public LoRAs for stronger T2I generation

4. aLoRA (Activated LoRA) → Activated LoRA: Fine-tuned LLMs for Intrinsics (2504.12397)
Only applies LoRA after invocation, letting the model reuse the base model’s KV cache instead of recomputing the full turn’s KV cache. Efficient in multi-turn conversations

5. LiLoRA (LoRA in LoRA) → LoRA in LoRA: Towards Parameter-Efficient Architecture Expansion for Continual Visual Instruction Tuning (2508.06202)
Shares the LoRA matrix A across tasks and additionally low-rank-decomposes matrix B to cut parameters in continual vision-text MLLMs

6. Sensitivity-LoRA → Sensitivity-LoRA: Low-Load Sensitivity-Based Fine-Tuning for Large Language Models (2509.09119)
Dynamically assigns ranks to weight matrices based on their sensitivity, measured using second-order derivatives

Read further below ↓
Also, subscribe to the Turing Post: https://www.turingpost.com/subscribe

3 replies

·

reacted to vikhyatk's post with 🔥 9 months ago

Post

6244

Just released a preview of Moondream 3! moondream/moondream3-preview

This is a 9B parameter, 2B active MoE VLM with state of the art visual reasoning capabilities.

More details in the release blog post: https://moondream.ai/blog/moondream-3-preview

3 replies

·

replied to andywu-kby's post 9 months ago

hi!

reacted to meg's post with 👍 10 months ago

Post

3030

New work from my socially-minded colleagues at Hugging Face, creating some foundations for AI companionship behavior evaluation.
Evaluation Dataset: AI-companionship/INTIMA
Paper: AI-companionship/INTIMA
Work from @giadap , @frimelle , @yjernite .

2 replies

·

Clem 🤗 PRO

AI & ML interests

Recent Activity

Organizations

Clem 🤗 PRO

AI & ML interests

Recent Activity

Organizations

clem's activity