Louis Brulé Naudet's picture

Louis Brulé Naudet

louisbrulenaudet

AI & ML interests

Research in business taxation and development, University Dauphine-PSL 📖 | Backed by the Microsoft for Startups Hub program and Google Cloud Platform for startups program | Hugging Face for Legal 🤗

Recent Activity

reacted to Smooke's post with 👀 4 days ago
AI Search Traffic Marketshare for Calling HackerNoon Blogs: 52% OpenAI, 30% Amazon & 18% Perplexity: https://hackernoon.com/ai-search-traffic-marketshare-for-calling-hackernoon-blogs-52percent-openai-30percent-amazon-and-18percent-perplexity OpenAI (51.8%) leads AI search traffic market share, based on my analysis of end-user–initiated AI Assistant and AI Search requests to HackerNoon. While Amazon (30.4%) and Perplexity (17.9%) also secured significant portions of the market, the total volume of requests (1,915,670 in 30 days) and competition among AI search providers indicate increasing reliance on AI for information retrieval and presentation. This analysis aggregates AI Assistant and AI Search queries to approximate end-user–initiated AI search traffic across HackerNoon URLs. Non-human traffic such as web crawlers, bots, and automated scripts have been filtered out to ensure data reflects only human-initiated requests. The dataset reviewed comprises instances where AI systems recommended HackerNoon content in response to human queries. Between February 28 and March 28, 2025, HackerNoon received 1,915,670 AI-referred search requests. OpenAI accounted for 991,580 requests, Amazon accounted for 581,990 requests , and Perplexity accounted for 342,100 requests, according to Cloudflare AI Audit tool, which currently tracks these top providers. HackerNoon is a technical audience, so our data is better positioned to answer questions like, if you work in tech what AI search engine do you rely on? Continue Reading... https://hackernoon.com/ai-search-traffic-marketshare-for-calling-hackernoon-blogs-52percent-openai-30percent-amazon-and-18percent-perplexity
View all activity

Organizations

MISATO-dataset's profile picture OpenVINO Toolkit's profile picture ONNXConfig for all's profile picture Gradio-Themes-Party's profile picture scikit-learn's profile picture Open-Source AI Meetup's profile picture BigLAM: BigScience Libraries, Archives and Museums's profile picture Université Dauphine-PSL's profile picture Stable Diffusion Dreambooth Concepts Library's profile picture Blog-explorers's profile picture OpenOrca's profile picture OpenLLM France's profile picture huggingPartyParis's profile picture Qwen's profile picture That Time I got Reincarnated as a Hugging Face Organization's profile picture ZeroGPU Explorers's profile picture Journalists on Hugging Face's profile picture Major TOM's profile picture MLX Community's profile picture Lemone's profile picture Social Post Explorers's profile picture Cognitive Computations's profile picture C4AI Community's profile picture Haiku's profile picture Dev Mode Explorers's profile picture Hugging Face for Legal's profile picture Hugging Face Discord Community's profile picture Dataset Tools's profile picture Data Is Better Together Contributor's profile picture

louisbrulenaudet's activity

reacted to Smooke's post with 👀 4 days ago
view post
Post
1674
AI Search Traffic Marketshare for Calling HackerNoon Blogs: 52% OpenAI, 30% Amazon & 18% Perplexity: https://hackernoon.com/ai-search-traffic-marketshare-for-calling-hackernoon-blogs-52percent-openai-30percent-amazon-and-18percent-perplexity

OpenAI (51.8%) leads AI search traffic market share, based on my analysis of end-user–initiated AI Assistant and AI Search requests to HackerNoon. While Amazon (30.4%) and Perplexity (17.9%) also secured significant portions of the market, the total volume of requests (1,915,670 in 30 days) and competition among AI search providers indicate increasing reliance on AI for information retrieval and presentation.

This analysis aggregates AI Assistant and AI Search queries to approximate end-user–initiated AI search traffic across HackerNoon URLs. Non-human traffic such as web crawlers, bots, and automated scripts have been filtered out to ensure data reflects only human-initiated requests. The dataset reviewed comprises instances where AI systems recommended HackerNoon content in response to human queries. Between February 28 and March 28, 2025, HackerNoon received 1,915,670 AI-referred search requests. OpenAI accounted for 991,580 requests, Amazon accounted for 581,990 requests , and Perplexity accounted for 342,100 requests, according to Cloudflare AI Audit tool, which currently tracks these top providers. HackerNoon is a technical audience, so our data is better positioned to answer questions like, if you work in tech what AI search engine do you rely on?

Continue Reading... https://hackernoon.com/ai-search-traffic-marketshare-for-calling-hackernoon-blogs-52percent-openai-30percent-amazon-and-18percent-perplexity
reacted to mlabonne's post with 🔥 14 days ago
reacted to clem's post with 🔥 14 days ago
view post
Post
2567
Nice new space to see how fast your personal or organization followers are growing on HF:
julien-c/follow-history

As you can see, I still have more followers than @julien-c even if he's trying to change this by building such cool spaces 😝😝😝
reacted to Jaward's post with 🔥 14 days ago
reacted to merve's post with 🤗 14 days ago
view post
Post
3657
So many open releases at Hugging Face past week 🤯 recapping all here ⤵️ merve/march-21-releases-67dbe10e185f199e656140ae

👀 Multimodal
> Mistral AI released a 24B vision LM, both base and instruction FT versions, sota 🔥 (OS)
> with IBM we released SmolDocling, a sota 256M document parser with Apache 2.0 license (OS)
> SpatialLM is a new vision LM that outputs 3D bounding boxes, comes with 0.5B (QwenVL based) and 1B (Llama based) variants
> SkyWork released SkyWork-R1V-38B, new vision reasoning model (OS)

💬 LLMs
> NVIDIA released new Nemotron models in 49B and 8B with their post-training dataset
> LG released EXAONE, new reasoning models in 2.4B, 7.8B and 32B
> Dataset: Glaive AI released a new reasoning dataset of 22M+ examples
> Dataset: NVIDIA released new helpfulness dataset HelpSteer3
> Dataset: OpenManusRL is a new agent dataset based on ReAct framework (OS)
> Open-R1 team released OlympicCoder, new competitive coder model in 7B and 32B
> Dataset: GeneralThought-430K is a new reasoning dataset (OS)

🖼️ Image Generation/Computer Vision
> Roboflow released RF-DETR, new real-time sota object detector (OS) 🔥
> YOLOE is a new real-time zero-shot object detector with text and visual prompts 🥹
> Stability AI released Stable Virtual Camera, a new novel view synthesis model
> Tencent released Hunyuan3D-2mini, new small and fast 3D asset generation model
> ByteDance released InfiniteYou, new realistic photo generation model
> StarVector is a new 8B model that generates svg from images
> FlexWorld is a new model that expands 3D views (OS)

🎤 Audio
> Sesame released CSM-1B new speech generation model (OS)

🤖 Robotics
> NVIDIA released GR00T, new robotics model for generalized reasoning and skills, along with the dataset

*OS ones have Apache 2.0 or MIT license
posted an update 14 days ago
view post
Post
844
I’ve just released logfire-callback on PyPI, designed to facilitate monitoring of Hugging Face Transformer training loops using Pydantic Logfire 🤗

The callback will automatically log training start with configuration parameters, periodic metrics and training completion ⏱️

Install the package using pip:
pip install logfire-callback

First, ensure you have a Logfire API token and set it as an environment variable:
export LOGFIRE_TOKEN=your_logfire_token

Then use the callback in your training code:
from transformers import Trainer, TrainingArguments
from logfire_callback import LogfireCallback

# Initialize your model, dataset, etc.

training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    # ... other training arguments
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    callbacks=[LogfireCallback()]  # Add the Logfire callback here
)

trainer.train()

If you have any feedback, please reach out at @louisbrulenaudet
reacted to m-ric's post with 🤗 20 days ago
reacted to clem's post with 🔥 about 2 months ago
view post
Post
2841
What are the best organizations to follow on @huggingface ?

On top of my head:
- Deepseek (35,000 followers): deepseek-ai
- Meta Llama (27,000 followers): meta-llama
- Black Forrest Labs (11,000 followers): black-forest-labs
- OpenAI (5,000 followers): openai
- Nvidia (16,000 followers): nvidia
- MIcrosoft (9,000 followers): microsoft
- AllenAI (2,000 followers): allenai
- Mistral (5,000 followers): mistralai
- XAI (600 followers): xai-org
- Stability AI (16,000 followers): stabilityai
- Qwen (16,000 followers): Qwen
- GoogleAI (8,000 followers): google
- Unsloth (3,000 followers): unsloth
- Bria AI (4,000 followers): briaai
- NousResearch (1,300 followers): NousResearch

Bonus, the agent course org with 17,000 followers: agents-course
  • 1 reply
·
reacted to davanstrien's post with 👍 about 2 months ago
reacted to m-ric's post with 👀 about 2 months ago
view post
Post
2941
𝗚𝗿𝗲𝗮𝘁 𝗳𝗲𝗮𝘁𝘂𝗿𝗲 𝗮𝗹𝗲𝗿𝘁: you can now share agents to the Hub! 🥳🥳

And any agent pushed to Hub get a cool Space interface to directly chat with it.

This was a real technical challenge: for instance, serializing tools to export them meant that you needed to get all the source code for a tool, verify that it was standalone (not relying on external variables), and gathering all the packages required to make it run.

Go try it out! 👉 https://github.com/huggingface/smolagents
  • 2 replies
·
reacted to merve's post with 👍 about 2 months ago
view post
Post
4872
Your weekly recap of open AI is here, and it's packed with models! merve/feb-14-releases-67af876b404cc27c6d837767

👀 Multimodal
> OpenGVLab released InternVideo 2.5 Chat models, new video LMs with long context
> AIDC released Ovis2 model family along with Ovis dataset, new vision LMs in different sizes (1B, 2B, 4B, 8B, 16B, 34B), with video and OCR support
> ColQwenStella-2b is a multilingual visual retrieval model that is sota in it's size
> Hoags-2B-Exp is a new multilingual vision LM with contextual reasoning, long context video understanding

💬 LLMs
A lot of math models!
> Open-R1 team released OpenR1-Math-220k large scale math reasoning dataset, along with Qwen2.5-220K-Math fine-tuned on the dataset, OpenR1-Qwen-7B
> Nomic AI released new Nomic Embed multilingual retrieval model, a MoE with 500 params with 305M active params, outperforming other models
> DeepScaleR-1.5B-Preview is a new DeepSeek-R1-Distill fine-tune using distributed RL on math
> LIMO is a new fine-tune of Qwen2.5-32B-Instruct on Math

🗣️ Audio
> Zonos-v0.1 is a new family of speech recognition models, which contains the model itself and embeddings

🖼️ Vision and Image Generation
> We have ported DepthPro of Apple to transformers for your convenience!
> illustrious-xl-v1.0 is a new illustration generation model
·
reacted to fffiloni's post with 🔥 about 2 months ago
reacted to clem's post with 🔥 about 2 months ago
view post
Post
3497
We crossed 1B+ tokens routed to inference providers partners on HF, that we released just a few days ago.

Just getting started of course but early users seem to like it & always happy to be able to partner with cool startups in the ecosystem.

Have you been using any integration and how can we make it better?

https://huggingface.co/blog/inference-providers
reacted to m-ric's post with 🚀 about 2 months ago
view post
Post
3080
Less is More for Reasoning (LIMO): a 32B model fine-tuned with 817 examples can beat o1-preview on math reasoning! 🤯

Do we really need o1's huge RL procedure to see reasoning emerge? It seems not.
Researchers from Shanghai Jiaotong University just demonstrated that carefully selected examples can boost math performance in large language models using SFT —no huge datasets or RL procedures needed.

Their procedure allows Qwen2.5-32B-Instruct to jump from 6.5% to 57% on AIME and from 59% to 95% on MATH, while using only 1% of the data in previous approaches.

⚡ The Less-is-More Reasoning Hypothesis:
‣ Minimal but precise examples that showcase optimal reasoning patterns matter more than sheer quantity
‣ Pre-training knowledge plus sufficient computational resources at inference levels up math skills

➡️ Core techniques:
‣ High-quality reasoning chains with self-verification steps
‣ 817 handpicked problems that encourage deeper reasoning
‣ Enough inference-time computation to allow extended reasoning

💪 Efficiency gains:
‣ Only 817 examples instead of 100k+
‣ 40.5% absolute improvement across 10 diverse benchmarks, outperforming models trained on 100x more data

This really challenges the notion that SFT leads to memorization rather than generalization! And opens up reasoning to GPU-poor researchers 🚀

Read the full paper here 👉  LIMO: Less is More for Reasoning (2502.03387)
reacted to fdaudens's post with ❤️ about 2 months ago
posted an update about 2 months ago
view post
Post
3243
I am pleased to introduce my first project built upon Hugging Face’s smolagents framework, integrated with Alpaca for financial market analysis automation 🦙🤗

The project implements technical indicators such as the Relative Strength Index (RSI) and Bollinger Bands to provide momentum and volatility analysis. Market data is retrieved through the Alpaca API, enabling access to historical price information across various timeframes.

AI-powered insights are generated using Hugging Face’s inference API, facilitating the analysis of market trends through natural language processing with DuckDuckGo search integration for real-time sentiment analysis based on financial news 🦆

Link to the GitHub project: https://github.com/louisbrulenaudet/agentic-market-tool

reacted to ImranzamanML's post with 😎 about 2 months ago
view post
Post
3250
Hugging Face just launched the AI Agents Course – a free journey from beginner to expert in AI agents!

- Learn AI Agent fundamentals, use cases and frameworks
- Use top libraries like LangChain & LlamaIndex
- Compete in challenges & earn a certificate
- Hands-on projects & real-world applications

https://huggingface.co/learn/agents-course/unit0/introduction

You can join for a live Q&A on Feb 12 at 5PM CET to learn more about the course here

https://www.youtube.com/live/PopqUt3MGyQ
reacted to m-ric's post with 🚀 3 months ago
view post
Post
2561
𝗪𝗲'𝘃𝗲 𝗷𝘂𝘀𝘁 𝗿𝗲𝗹𝗲𝗮𝘀𝗲𝗱 𝘀𝗺𝗼𝗹𝗮𝗴𝗲𝗻𝘁𝘀 𝘃𝟭.𝟯.𝟬 🚀, and it comes with a major feature: you can now log agent runs using OpenTelemetry to inspect them afterwards! 📊

This interactive format is IMO much easier to inspect big multi-step runs than endless console logs.

The setup is very easy, in a few lines of code.

Find a tutorial here 👉 https://huggingface.co/docs/smolagents/tutorials/inspect_runs
  • 5 replies
·
reacted to MonsterMMORPG's post with 🔥 3 months ago
view post
Post
4444
It is now possible to generate 16 Megapixel (4096x4096) raw images with SANA 4K model using under 8GB VRAM, 4 Megapixel (2048x2048) images using under 6GB VRAM, and 1 Megapixel (1024x1024) images using under 4GB VRAM thanks to new optimizations

13 January 2024 Update

Installers : https://www.patreon.com/posts/from-nvidia-labs-116474081

New 4K Tutorial Video : https://youtu.be/GjENQfHF4W8

Now the APP will use Diffusers Pipeline and it has huge VRAM optimizations

You need to reinstall

The models will be downloaded into your Hugging Face cache folder when you first time generate something

How to Get Installation Logs and How to Change Hugging Face Cache Folder :
https://www.patreon.com/posts/108419878

Please make a fresh install

When you enable all 4 optimizations the VRAM usages are like below

Make sure shared VRAM is enabled because initial loading of the model need more VRAM

Enable VAE Tiling + Enable VAE Slicing + Enable Model CPU Offload +
Enable Sequential CPU Offload

1K (1024x1024) : 4 GB GPUs
2K (2048x2048) : 6 GB GPUs
4K (4096x4096) : 8 GB GPUs

Still in any case may work on your GPU test it

Just Enable VAE Tiling + Enable Model CPU Offload works great in many cases

All below attached images are generated via SANA 4K model, they are RAW and their resolution is 5376x3072

Official repo page : https://github.com/NVlabs/Sana
  • 2 replies
·
reacted to anakin87's post with ❤️ 4 months ago
view post
Post
1665
Tulu 3 SFT Mixture by AllenAI is a massive, good, multilingual dataset for fine-tuning Language Models.

Unfortunately, it was missing the "language" column.

I added it using the good old fastText.

Check out the dataset here 👉 anakin87/tulu-3-sft-mixture-with-language

  • 1 reply
·