
That Time I got Reincarnated as a Hugging Face Organization
community
AI & ML interests
LowRes animated waifus (✿◡‿◡)
lowres's activity

louisbrulenaudet
posted
an
update
23 days ago
Post
901
I’ve just released logfire-callback on PyPI, designed to facilitate monitoring of Hugging Face Transformer training loops using Pydantic Logfire 🤗
The callback will automatically log training start with configuration parameters, periodic metrics and training completion ⏱️
Install the package using pip:
First, ensure you have a Logfire API token and set it as an environment variable:
Then use the callback in your training code:
If you have any feedback, please reach out at @louisbrulenaudet
The callback will automatically log training start with configuration parameters, periodic metrics and training completion ⏱️
Install the package using pip:
pip install logfire-callback
First, ensure you have a Logfire API token and set it as an environment variable:
export LOGFIRE_TOKEN=your_logfire_token
Then use the callback in your training code:
from transformers import Trainer, TrainingArguments
from logfire_callback import LogfireCallback
# Initialize your model, dataset, etc.
training_args = TrainingArguments(
output_dir="./results",
num_train_epochs=3,
# ... other training arguments
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
callbacks=[LogfireCallback()] # Add the Logfire callback here
)
trainer.train()
If you have any feedback, please reach out at @louisbrulenaudet
Post
4202
There seems to multiple paid apps shared here that are based on models on hf, but some ppl sell their wrappers as "products" and promote them here. For a long time, hf was the best and only platform to do oss model stuff but with the recent AI website builders anyone can create a product (really crappy ones btw) and try to sell it with no contribution to oss stuff. Please dont do this, or try finetuning the models you use...
Sorry for filling yall feed with this bs but yk...
Sorry for filling yall feed with this bs but yk...
Post
1599
Gemma 3 seems to be really good at human preference. Just waiting for ppl to see it.

louisbrulenaudet
posted
an
update
about 2 months ago
Post
3273
I am pleased to introduce my first project built upon Hugging Face’s smolagents framework, integrated with Alpaca for financial market analysis automation 🦙🤗
The project implements technical indicators such as the Relative Strength Index (RSI) and Bollinger Bands to provide momentum and volatility analysis. Market data is retrieved through the Alpaca API, enabling access to historical price information across various timeframes.
AI-powered insights are generated using Hugging Face’s inference API, facilitating the analysis of market trends through natural language processing with DuckDuckGo search integration for real-time sentiment analysis based on financial news 🦆
Link to the GitHub project: https://github.com/louisbrulenaudet/agentic-market-tool
The project implements technical indicators such as the Relative Strength Index (RSI) and Bollinger Bands to provide momentum and volatility analysis. Market data is retrieved through the Alpaca API, enabling access to historical price information across various timeframes.
AI-powered insights are generated using Hugging Face’s inference API, facilitating the analysis of market trends through natural language processing with DuckDuckGo search integration for real-time sentiment analysis based on financial news 🦆
Link to the GitHub project: https://github.com/louisbrulenaudet/agentic-market-tool
Post
1624
R1 is out! And with a lot of other R1 releated models...

not-lain
updated
a
Space
3 months ago
Post
3355
Happy New Year 2025 🤗
For the Huggingface community.
For the Huggingface community.

peaceAsh
authored
a
paper
4 months ago

spedrox-sac
updated
a
Space
4 months ago

louisbrulenaudet
posted
an
update
5 months ago
Post
2093
I’ve published a new dataset to simplify model merging 🤗
This dataset facilitates the search for compatible architectures for model merging with @arcee_ai’s mergekit, streamlining the automation of high-performance merge searches 📖
Dataset : louisbrulenaudet/mergekit-configs
This dataset facilitates the search for compatible architectures for model merging with @arcee_ai’s mergekit, streamlining the automation of high-performance merge searches 📖
Dataset : louisbrulenaudet/mergekit-configs

louisbrulenaudet
posted
an
update
6 months ago
Post
1346
Introducing Lemone-router, a series of classification models designed to produce an optimal multi-agent system for different branches of tax law.
Trained on a base of 49k lines comprising a set of synthetic questions generated by GPT-4 Turbo and Llama 3.1 70B, which have been further refined through evol-instruction tuning and manual curation and authority documents, these models are based on an 8-category decomposition of the classification scheme derived from the Bulletin officiel des finances publiques - impôts :
It achieves the following results on the evaluation set:
- Loss: 0.4734
- Accuracy: 0.9191
Link to the collection: louisbrulenaudet/lemone-router-671cce21d6410f3570514762
Trained on a base of 49k lines comprising a set of synthetic questions generated by GPT-4 Turbo and Llama 3.1 70B, which have been further refined through evol-instruction tuning and manual curation and authority documents, these models are based on an 8-category decomposition of the classification scheme derived from the Bulletin officiel des finances publiques - impôts :
label2id = {
"Bénéfices professionnels": 0,
"Contrôle et contentieux": 1,
"Dispositifs transversaux": 2,
"Fiscalité des entreprises": 3,
"Patrimoine et enregistrement": 4,
"Revenus particuliers": 5,
"Revenus patrimoniaux": 6,
"Taxes sur la consommation": 7
}
id2label = {
0: "Bénéfices professionnels",
1: "Contrôle et contentieux",
2: "Dispositifs transversaux",
3: "Fiscalité des entreprises",
4: "Patrimoine et enregistrement",
5: "Revenus particuliers",
6: "Revenus patrimoniaux",
7: "Taxes sur la consommation"
}
It achieves the following results on the evaluation set:
- Loss: 0.4734
- Accuracy: 0.9191
Link to the collection: louisbrulenaudet/lemone-router-671cce21d6410f3570514762

louisbrulenaudet
posted
an
update
6 months ago
Post
3132
🚨 I have $3,500 in Azure credits, including access to an H100 (96 Go), expiring on November 12, 2024.
I won’t be able to use it all myself, so I’m reaching out to the @huggingface community: Are there any open-source projets with data ready for some compute power?
Let’s collaborate and make the most of it together 🔗
I won’t be able to use it all myself, so I’m reaching out to the @huggingface community: Are there any open-source projets with data ready for some compute power?
Let’s collaborate and make the most of it together 🔗

louisbrulenaudet
posted
an
update
6 months ago
Post
2154
My biggest release of the year: a series of 7 specialized embedding models for information retrieval within tax documents, is now available for free on Hugging Face 🤗
These new models aim to offer an open source alternative for in-domain semantic search from large text corpora and will improve RAG systems and context addition for large language models.
Trained on more than 43 million tax tokens derived from semi-synthetic and raw-synthetic data, enriched by various methods (in particular MSFT's evol-instruct by @intfloat ), and corrected by humans, this project is the fruit of hundreds of hours of work and is the culmination of a global effort to open up legal technologies that has only just begun.
A big thank you to Microsoft for Startups for giving me access to state-of-the-art infrastructure to train these models, and to @julien-c , @clem 🤗, @thomwolf and the whole HF team for the inference endpoint API and the generous provision of Meta LLama-3.1-70B. Special thanks also to @tomaarsen for his invaluable advice on training embedding models and Loss functions ❤️
Models are available on my personal HF page, into the Lemone-embed collection: louisbrulenaudet/lemone-embed-66fdc24000df732b395df29b
These new models aim to offer an open source alternative for in-domain semantic search from large text corpora and will improve RAG systems and context addition for large language models.
Trained on more than 43 million tax tokens derived from semi-synthetic and raw-synthetic data, enriched by various methods (in particular MSFT's evol-instruct by @intfloat ), and corrected by humans, this project is the fruit of hundreds of hours of work and is the culmination of a global effort to open up legal technologies that has only just begun.
A big thank you to Microsoft for Startups for giving me access to state-of-the-art infrastructure to train these models, and to @julien-c , @clem 🤗, @thomwolf and the whole HF team for the inference endpoint API and the generous provision of Meta LLama-3.1-70B. Special thanks also to @tomaarsen for his invaluable advice on training embedding models and Loss functions ❤️
Models are available on my personal HF page, into the Lemone-embed collection: louisbrulenaudet/lemone-embed-66fdc24000df732b395df29b

louisbrulenaudet
posted
an
update
7 months ago
Post
2610
The Romulus model series has been released on Hugging Face, continually pre-trained on 34,864,949 tokens of French laws and intended to serve as a foundation for fine-tuning on labeled data 🤗
The training code, dataset and model weights are open and available free on HF and the training was based on H100 provided by Microsoft for Startups using Unsloth AI by @danielhanchen and @shimmyshimmer 🦥
Link to the base model: louisbrulenaudet/Romulus-cpt-Llama-3.1-8B-v0.1
Link to the instruct model: louisbrulenaudet/Romulus-cpt-Llama-3.1-8B-v0.1-Instruct
Link to the dataset: louisbrulenaudet/Romulus-cpt-fr
Please note that these models have not been aligned for the production of usable texts as they stand, and will certainly need to be refined for the desired tasks in order to produce satisfactory results.
The training code, dataset and model weights are open and available free on HF and the training was based on H100 provided by Microsoft for Startups using Unsloth AI by @danielhanchen and @shimmyshimmer 🦥
Link to the base model: louisbrulenaudet/Romulus-cpt-Llama-3.1-8B-v0.1
Link to the instruct model: louisbrulenaudet/Romulus-cpt-Llama-3.1-8B-v0.1-Instruct
Link to the dataset: louisbrulenaudet/Romulus-cpt-fr
Please note that these models have not been aligned for the production of usable texts as they stand, and will certainly need to be refined for the desired tasks in order to produce satisfactory results.