Louis Brulé Naudet

louisbrulenaudet

AI & ML interests

Research in business taxation and development, University Dauphine-PSL 📖 | Backed by the Microsoft for Startups Hub program and Google Cloud Platform for startups program | Hugging Face for Legal 🤗

Recent Activity

reacted to Smooke's post with 👀 4 days ago

AI Search Traffic Marketshare for Calling HackerNoon Blogs: 52% OpenAI, 30% Amazon & 18% Perplexity: https://hackernoon.com/ai-search-traffic-marketshare-for-calling-hackernoon-blogs-52percent-openai-30percent-amazon-and-18percent-perplexity OpenAI (51.8%) leads AI search traffic market share, based on my analysis of end-user–initiated AI Assistant and AI Search requests to HackerNoon. While Amazon (30.4%) and Perplexity (17.9%) also secured significant portions of the market, the total volume of requests (1,915,670 in 30 days) and competition among AI search providers indicate increasing reliance on AI for information retrieval and presentation. This analysis aggregates AI Assistant and AI Search queries to approximate end-user–initiated AI search traffic across HackerNoon URLs. Non-human traffic such as web crawlers, bots, and automated scripts have been filtered out to ensure data reflects only human-initiated requests. The dataset reviewed comprises instances where AI systems recommended HackerNoon content in response to human queries. Between February 28 and March 28, 2025, HackerNoon received 1,915,670 AI-referred search requests. OpenAI accounted for 991,580 requests, Amazon accounted for 581,990 requests , and Perplexity accounted for 342,100 requests, according to Cloudflare AI Audit tool, which currently tracks these top providers. HackerNoon is a technical audience, so our data is better positioned to answer questions like, if you work in tech what AI search engine do you rely on? Continue Reading... https://hackernoon.com/ai-search-traffic-marketshare-for-calling-hackernoon-blogs-52percent-openai-30percent-amazon-and-18percent-perplexity

upvoted an article 10 days ago

Training and Finetuning Reranker Models with Sentence Transformers v4

liked a model 13 days ago

mistralai/Mistral-Small-3.1-24B-Instruct-2503

View all activity

Organizations

louisbrulenaudet's activity

reacted to Smooke's post with 👀 4 days ago

Post

1674

AI Search Traffic Marketshare for Calling HackerNoon Blogs: 52% OpenAI, 30% Amazon & 18% Perplexity: https://hackernoon.com/ai-search-traffic-marketshare-for-calling-hackernoon-blogs-52percent-openai-30percent-amazon-and-18percent-perplexity

OpenAI (51.8%) leads AI search traffic market share, based on my analysis of end-user–initiated AI Assistant and AI Search requests to HackerNoon. While Amazon (30.4%) and Perplexity (17.9%) also secured significant portions of the market, the total volume of requests (1,915,670 in 30 days) and competition among AI search providers indicate increasing reliance on AI for information retrieval and presentation.

This analysis aggregates AI Assistant and AI Search queries to approximate end-user–initiated AI search traffic across HackerNoon URLs. Non-human traffic such as web crawlers, bots, and automated scripts have been filtered out to ensure data reflects only human-initiated requests. The dataset reviewed comprises instances where AI systems recommended HackerNoon content in response to human queries. Between February 28 and March 28, 2025, HackerNoon received 1,915,670 AI-referred search requests. OpenAI accounted for 991,580 requests, Amazon accounted for 581,990 requests , and Perplexity accounted for 342,100 requests, according to Cloudflare AI Audit tool, which currently tracks these top providers. HackerNoon is a technical audience, so our data is better positioned to answer questions like, if you work in tech what AI search engine do you rely on?

Continue Reading... https://hackernoon.com/ai-search-traffic-marketshare-for-calling-hackernoon-blogs-52percent-openai-30percent-amazon-and-18percent-perplexity

reacted to mlabonne's post with 🔥 14 days ago

Post

7734

✂️ AutoAbliteration

I made a Colab notebook to automatically abliterate models.

It's quite general, so you can do interesting stuff like blocking a given language in the model outputs.

💻 Colab: https://colab.research.google.com/drive/1RmLv-pCMBBsQGXQIM8yF-OdCNyoylUR1?usp=sharing

reacted to clem's post with 🔥 14 days ago

Post

2567

Nice new space to see how fast your personal or organization followers are growing on HF:
julien-c/follow-history

As you can see, I still have more followers than @julien-c even if he's trying to change this by building such cool spaces 😝😝😝

reacted to Jaward's post with 🔥 14 days ago

Post

3620

Implemented a custom multimodal GRPO trainer that scales for Small VLMs, supports cpu and gpu with vllm + flash attention. Using SmolVLM-256M-Instruct reference & reward model, wasn’t trained for long btw, still got some sparks of “thinking”:)
Code: https://github.com/Jaykef/ai-algorithms/blob/main/grpo_multimodal_reasoner.ipynb

1 reply

reacted to merve's post with 🤗 14 days ago

Post

3657

So many open releases at Hugging Face past week 🤯 recapping all here ⤵️ merve/march-21-releases-67dbe10e185f199e656140ae

👀 Multimodal
> Mistral AI released a 24B vision LM, both base and instruction FT versions, sota 🔥 (OS)
> with IBM we released SmolDocling, a sota 256M document parser with Apache 2.0 license (OS)
> SpatialLM is a new vision LM that outputs 3D bounding boxes, comes with 0.5B (QwenVL based) and 1B (Llama based) variants
> SkyWork released SkyWork-R1V-38B, new vision reasoning model (OS)

💬 LLMs
> NVIDIA released new Nemotron models in 49B and 8B with their post-training dataset
> LG released EXAONE, new reasoning models in 2.4B, 7.8B and 32B
> Dataset: Glaive AI released a new reasoning dataset of 22M+ examples
> Dataset: NVIDIA released new helpfulness dataset HelpSteer3
> Dataset: OpenManusRL is a new agent dataset based on ReAct framework (OS)
> Open-R1 team released OlympicCoder, new competitive coder model in 7B and 32B
> Dataset: GeneralThought-430K is a new reasoning dataset (OS)

🖼️ Image Generation/Computer Vision
> Roboflow released RF-DETR, new real-time sota object detector (OS) 🔥
> YOLOE is a new real-time zero-shot object detector with text and visual prompts 🥹
> Stability AI released Stable Virtual Camera, a new novel view synthesis model
> Tencent released Hunyuan3D-2mini, new small and fast 3D asset generation model
> ByteDance released InfiniteYou, new realistic photo generation model
> StarVector is a new 8B model that generates svg from images
> FlexWorld is a new model that expands 3D views (OS)

🎤 Audio
> Sesame released CSM-1B new speech generation model (OS)

🤖 Robotics
> NVIDIA released GR00T, new robotics model for generalized reasoning and skills, along with the dataset

*OS ones have Apache 2.0 or MIT license

posted an update 14 days ago

Post

844

I’ve just released logfire-callback on PyPI, designed to facilitate monitoring of Hugging Face Transformer training loops using Pydantic Logfire 🤗

The callback will automatically log training start with configuration parameters, periodic metrics and training completion ⏱️

Install the package using pip:

pip install logfire-callback

First, ensure you have a Logfire API token and set it as an environment variable:

export LOGFIRE_TOKEN=your_logfire_token

Then use the callback in your training code:

from transformers import Trainer, TrainingArguments
from logfire_callback import LogfireCallback

# Initialize your model, dataset, etc.

training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    # ... other training arguments
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    callbacks=[LogfireCallback()]  # Add the Logfire callback here
)

trainer.train()

If you have any feedback, please reach out at @louisbrulenaudet

reacted to m-ric's post with 🤗 20 days ago

Post

4761

smolagents now support vLLM! 🥳

As one of the most popular local inference solutions, the community had been asking us to integrate vLLM: after a heavy refactoring of our LLM classes, we've just released smolagents 1.11.0, with a brand new VLLMModel class.

Go try it and tell us what you think!

https://github.com/huggingface/smolagents/blob/45b2c86857b7f7657daaa74e4d17d347e9e2c4a4/src/smolagents/models.py#L497

reacted to clem's post with 🔥 about 2 months ago

Post

2841

What are the best organizations to follow on @huggingface ?

On top of my head:
- Deepseek (35,000 followers):

deepseek-ai
- Meta Llama (27,000 followers):

meta-llama
- Black Forrest Labs (11,000 followers):

black-forest-labs
- OpenAI (5,000 followers):

openai
- Nvidia (16,000 followers):

nvidia
- MIcrosoft (9,000 followers):

microsoft
- AllenAI (2,000 followers):

allenai
- Mistral (5,000 followers):

mistralai
- XAI (600 followers):

xai-org
- Stability AI (16,000 followers):

stabilityai
- Qwen (16,000 followers):

Qwen
- GoogleAI (8,000 followers):

google
- Unsloth (3,000 followers):

unsloth
- Bria AI (4,000 followers):

briaai
- NousResearch (1,300 followers):

NousResearch

Bonus, the agent course org with 17,000 followers:

agents-course

1 reply

reacted to davanstrien's post with 👍 about 2 months ago

Post

1382

Made some significant updates to my 🤗 semantic datasets search app. If you love falling into a wiki black hole, you might like this...

https://huggingface.co/spaces/librarian-bots/huggingface-datasets-semantic-search

reacted to m-ric's post with 👀 about 2 months ago

Post

2941

𝗚𝗿𝗲𝗮𝘁 𝗳𝗲𝗮𝘁𝘂𝗿𝗲 𝗮𝗹𝗲𝗿𝘁: you can now share agents to the Hub! 🥳🥳

And any agent pushed to Hub get a cool Space interface to directly chat with it.

This was a real technical challenge: for instance, serializing tools to export them meant that you needed to get all the source code for a tool, verify that it was standalone (not relying on external variables), and gathering all the packages required to make it run.

Go try it out! 👉 https://github.com/huggingface/smolagents

2 replies

reacted to merve's post with 👍 about 2 months ago

Post

4872

Your weekly recap of open AI is here, and it's packed with models! merve/feb-14-releases-67af876b404cc27c6d837767

👀 Multimodal
> OpenGVLab released InternVideo 2.5 Chat models, new video LMs with long context
> AIDC released Ovis2 model family along with Ovis dataset, new vision LMs in different sizes (1B, 2B, 4B, 8B, 16B, 34B), with video and OCR support
> ColQwenStella-2b is a multilingual visual retrieval model that is sota in it's size
> Hoags-2B-Exp is a new multilingual vision LM with contextual reasoning, long context video understanding

💬 LLMs
A lot of math models!
> Open-R1 team released OpenR1-Math-220k large scale math reasoning dataset, along with Qwen2.5-220K-Math fine-tuned on the dataset, OpenR1-Qwen-7B
> Nomic AI released new Nomic Embed multilingual retrieval model, a MoE with 500 params with 305M active params, outperforming other models
> DeepScaleR-1.5B-Preview is a new DeepSeek-R1-Distill fine-tune using distributed RL on math
> LIMO is a new fine-tune of Qwen2.5-32B-Instruct on Math

🗣️ Audio
> Zonos-v0.1 is a new family of speech recognition models, which contains the model itself and embeddings

🖼️ Vision and Image Generation
> We have ported DepthPro of Apple to transformers for your convenience!
> illustrious-xl-v1.0 is a new illustration generation model

3 replies

reacted to fffiloni's post with 🔥 about 2 months ago

Post

8153

I was thinking i need to step up my game on training Flux LoRas models, time to have some fun ! ☀️

Expect a new drop per week on aesthetics that catched my attention, here are 3 of them that worked really well !

fffiloni/cute-comic-800
fffiloni/carbo-800
fffiloni/oniric-750

2 replies

reacted to clem's post with 🔥 about 2 months ago

Post

3497

We crossed 1B+ tokens routed to inference providers partners on HF, that we released just a few days ago.

Just getting started of course but early users seem to like it & always happy to be able to partner with cool startups in the ecosystem.

Have you been using any integration and how can we make it better?

https://huggingface.co/blog/inference-providers

reacted to m-ric's post with 🚀 about 2 months ago

Post

3080

Less is More for Reasoning (LIMO): a 32B model fine-tuned with 817 examples can beat o1-preview on math reasoning! 🤯

Do we really need o1's huge RL procedure to see reasoning emerge? It seems not.
Researchers from Shanghai Jiaotong University just demonstrated that carefully selected examples can boost math performance in large language models using SFT —no huge datasets or RL procedures needed.

Their procedure allows Qwen2.5-32B-Instruct to jump from 6.5% to 57% on AIME and from 59% to 95% on MATH, while using only 1% of the data in previous approaches.

⚡ The Less-is-More Reasoning Hypothesis:
‣ Minimal but precise examples that showcase optimal reasoning patterns matter more than sheer quantity
‣ Pre-training knowledge plus sufficient computational resources at inference levels up math skills

➡️ Core techniques:
‣ High-quality reasoning chains with self-verification steps
‣ 817 handpicked problems that encourage deeper reasoning
‣ Enough inference-time computation to allow extended reasoning

💪 Efficiency gains:
‣ Only 817 examples instead of 100k+
‣ 40.5% absolute improvement across 10 diverse benchmarks, outperforming models trained on 100x more data

This really challenges the notion that SFT leads to memorization rather than generalization! And opens up reasoning to GPU-poor researchers 🚀

Read the full paper here 👉 LIMO: Less is More for Reasoning (2502.03387)

reacted to fdaudens's post with ❤️ about 2 months ago

Post

5825

🎯 Perplexity drops their FIRST open-weight model on Hugging Face: A decensored DeepSeek-R1 with full reasoning capabilities. Tested on 1000+ examples for unbiased responses.

Check it out: perplexity-ai/r1-1776
Blog post: https://perplexity.ai/hub/blog/open-sourcing-r1-1776

1 reply

posted an update about 2 months ago

Post

3243

I am pleased to introduce my first project built upon Hugging Face’s smolagents framework, integrated with Alpaca for financial market analysis automation 🦙🤗

The project implements technical indicators such as the Relative Strength Index (RSI) and Bollinger Bands to provide momentum and volatility analysis. Market data is retrieved through the Alpaca API, enabling access to historical price information across various timeframes.

AI-powered insights are generated using Hugging Face’s inference API, facilitating the analysis of market trends through natural language processing with DuckDuckGo search integration for real-time sentiment analysis based on financial news 🦆

Link to the GitHub project: https://github.com/louisbrulenaudet/agentic-market-tool

reacted to ImranzamanML's post with 😎 about 2 months ago

Post

3250

Hugging Face just launched the AI Agents Course – a free journey from beginner to expert in AI agents!

- Learn AI Agent fundamentals, use cases and frameworks
- Use top libraries like LangChain & LlamaIndex
- Compete in challenges & earn a certificate
- Hands-on projects & real-world applications

https://huggingface.co/learn/agents-course/unit0/introduction

You can join for a live Q&A on Feb 12 at 5PM CET to learn more about the course here

https://www.youtube.com/live/PopqUt3MGyQ

reacted to m-ric's post with 🚀 3 months ago

Post

2561

𝗪𝗲'𝘃𝗲 𝗷𝘂𝘀𝘁 𝗿𝗲𝗹𝗲𝗮𝘀𝗲𝗱 𝘀𝗺𝗼𝗹𝗮𝗴𝗲𝗻𝘁𝘀 𝘃𝟭.𝟯.𝟬 🚀, and it comes with a major feature: you can now log agent runs using OpenTelemetry to inspect them afterwards! 📊

This interactive format is IMO much easier to inspect big multi-step runs than endless console logs.

The setup is very easy, in a few lines of code.

Find a tutorial here 👉 https://huggingface.co/docs/smolagents/tutorials/inspect_runs

5 replies

reacted to MonsterMMORPG's post with 🔥 3 months ago

Post

4444

It is now possible to generate 16 Megapixel (4096x4096) raw images with SANA 4K model using under 8GB VRAM, 4 Megapixel (2048x2048) images using under 6GB VRAM, and 1 Megapixel (1024x1024) images using under 4GB VRAM thanks to new optimizations

13 January 2024 Update

Installers : https://www.patreon.com/posts/from-nvidia-labs-116474081

New 4K Tutorial Video : https://youtu.be/GjENQfHF4W8

Now the APP will use Diffusers Pipeline and it has huge VRAM optimizations

You need to reinstall

The models will be downloaded into your Hugging Face cache folder when you first time generate something

How to Get Installation Logs and How to Change Hugging Face Cache Folder :
https://www.patreon.com/posts/108419878

Please make a fresh install

When you enable all 4 optimizations the VRAM usages are like below

Make sure shared VRAM is enabled because initial loading of the model need more VRAM

Enable VAE Tiling + Enable VAE Slicing + Enable Model CPU Offload +
Enable Sequential CPU Offload

1K (1024x1024) : 4 GB GPUs
2K (2048x2048) : 6 GB GPUs
4K (4096x4096) : 8 GB GPUs

Still in any case may work on your GPU test it

Just Enable VAE Tiling + Enable Model CPU Offload works great in many cases

All below attached images are generated via SANA 4K model, they are RAW and their resolution is 5376x3072

Official repo page : https://github.com/NVlabs/Sana

2 replies

reacted to anakin87's post with ❤️ 4 months ago

Post

1665

Tulu 3 SFT Mixture by AllenAI is a massive, good, multilingual dataset for fine-tuning Language Models.

Unfortunately, it was missing the "language" column.

I added it using the good old fastText.

Check out the dataset here 👉 anakin87/tulu-3-sft-mixture-with-language

1 reply