Alessandro Ercolani

giux78

AI & ML interests

NLP, Reinforcement Learning, Semantics, Computational Neuroscience

Recent Activity

updated a dataset 4 days ago
mii-llm/requests
View all activity

Organizations

Rocket AI's profile picture Spaces-explorers's profile picture Blog-explorers's profile picture FairMind's profile picture Business Operating System's profile picture mii-community's profile picture Social Post Explorers's profile picture mii-llm's profile picture Coloss's profile picture

giux78's activity

reacted to their post with 👀 2 days ago
view post
Post
2214
LLAMA4 release highlight the importance of political and social bias. According to their own evaluation described in the release blog post:
- Refusals on contentious prompts dropped from 7% (hashtag#LLAMA 3.3) to under 2%
- Unequal response refusals are now under 1%
- Political lean bias is said to be halved compared to hashtag#LLaMA 3.3 and comparable to Grok

However, we @efederici @mferraretto @FinancialSupport and I released some weeks ago an independent open source benchmark called Propaganda to measure political bias in LLMs: https://github.com/mii-llm/propaganda

In the chart below, we evaluated multiple leading models on the basis of ratings across a range of prompts designed to expose ideological leanings.

Despite Meta’s stated neutrality goals, LLAMA4 ranks at the very top in terms of total ratings aligned with a clear ideological bias. The models were tested on their ability to respond even-handedly to politically sensitive prompts. LLaMA 4 scored even higher than models known for strong alignment policies like GPT-4o.

LLMs may be refusing less, but they still show bias through content framing. This suggests that refusal rates alone are not a sufficient measure of ideological bias. Relying solely on internal evaluations from AI labs also raises concerns about transparency and objectivity.
posted an update 5 days ago
view post
Post
2214
LLAMA4 release highlight the importance of political and social bias. According to their own evaluation described in the release blog post:
- Refusals on contentious prompts dropped from 7% (hashtag#LLAMA 3.3) to under 2%
- Unequal response refusals are now under 1%
- Political lean bias is said to be halved compared to hashtag#LLaMA 3.3 and comparable to Grok

However, we @efederici @mferraretto @FinancialSupport and I released some weeks ago an independent open source benchmark called Propaganda to measure political bias in LLMs: https://github.com/mii-llm/propaganda

In the chart below, we evaluated multiple leading models on the basis of ratings across a range of prompts designed to expose ideological leanings.

Despite Meta’s stated neutrality goals, LLAMA4 ranks at the very top in terms of total ratings aligned with a clear ideological bias. The models were tested on their ability to respond even-handedly to politically sensitive prompts. LLaMA 4 scored even higher than models known for strong alignment policies like GPT-4o.

LLMs may be refusing less, but they still show bias through content framing. This suggests that refusal rates alone are not a sufficient measure of ideological bias. Relying solely on internal evaluations from AI labs also raises concerns about transparency and objectivity.
reacted to tomaarsen's post with 🔥 15 days ago
view post
Post
2364
‼️Sentence Transformers v4.0 is out! You can now train and finetune reranker models with multi-GPU training, bf16 support, loss logging, callbacks & much more. I also prove that finetuning on your domain helps much more than you might think.

1️⃣ Reranker Training Refactor
Reranker models can now be trained using an extensive trainer with a lot of powerful features:
- MultiGPU Training (Data Parallelism (DP) and Distributed Data Parallelism (DDP))
- bf16 training support; loss logging
- Evaluation datasets + evaluation loss
- Improved callback support + an excellent Weights & Biases integration
- Gradient checkpointing, gradient accumulation
- Model card generation
- Resuming from a training checkpoint without performance loss
- Hyperparameter Optimization
and much more!

Read my detailed blogpost to learn about the components that make up this new training approach: https://huggingface.co/blog/train-reranker
Notably, the release is fully backwards compatible: all deprecations are soft, meaning that they still work but emit a warning informing you how to upgrade.

2️⃣ New Reranker Losses
- 11 new losses:
- 2 traditional losses: BinaryCrossEntropy and CrossEntropy
- 2 distillation losses: MSE and MarginMSE
- 2 in-batch negatives losses: MNRL (a.k.a. InfoNCE) and CMNRL
- 5 learning to rank losses: Lambda, p-ListMLE, ListNet, RankNet, ListMLE

3️⃣ New Reranker Documentation
- New Training Overview, Loss Overview, API Reference docs
- 5 new, 1 refactored training examples docs pages
- 13 new, 6 refactored training scripts
- Migration guides (2.x -> 3.x, 3.x -> 4.x)

4️⃣ Blogpost
Alongside the release, I've written a blogpost where I finetune ModernBERT on a generic question-answer dataset. My finetunes easily outperform all general-purpose reranker models, even models 4x as big. Finetuning on your domain is definitely worth it: https://huggingface.co/blog/train-reranker

See the full release notes here: https://github.com/UKPLab/sentence-transformers/releases/v4.0.1
reacted to their post with 👀🤗 17 days ago
view post
Post
3170
This is truly an inspirational story please help us spread the word, @clem , @thomwolf and everyone who supports open source AI.

A few weeks ago, @mmuffo94 and @cittiberto from indigo_ai launched the Chatbot Arena for the Italian language: https://indigo.ai/it/chatbot-arena-italia/.

To our surprise, among the top-ranked models is mii-llm/maestrale-chat-v0.4-beta a carefully fine-tuned version of mistralai/Mistral-7B-v0.1, developed by @efederici and @mferraretto from mii-llm , and released nearly a year ago.

At this very moment, as shown in the screenshot, mii-llm/maestrale-chat-v0.4-beta is ranked 8th right between ChatGPT-4.5 and ChatGPT-4o.

It's likely that for several months, the best Italian speaking LLM has been an open source 7B model created by open source contributors and hardly anyone knew it.
  • 2 replies
·
posted an update 17 days ago
view post
Post
3170
This is truly an inspirational story please help us spread the word, @clem , @thomwolf and everyone who supports open source AI.

A few weeks ago, @mmuffo94 and @cittiberto from indigo_ai launched the Chatbot Arena for the Italian language: https://indigo.ai/it/chatbot-arena-italia/.

To our surprise, among the top-ranked models is mii-llm/maestrale-chat-v0.4-beta a carefully fine-tuned version of mistralai/Mistral-7B-v0.1, developed by @efederici and @mferraretto from mii-llm , and released nearly a year ago.

At this very moment, as shown in the screenshot, mii-llm/maestrale-chat-v0.4-beta is ranked 8th right between ChatGPT-4.5 and ChatGPT-4o.

It's likely that for several months, the best Italian speaking LLM has been an open source 7B model created by open source contributors and hardly anyone knew it.
  • 2 replies
·
reacted to their post with ❤️ 20 days ago
view post
Post
2860
@ mii-llm with @efederici @mferraretto @FinancialSupport and @DeepMount00 we just released #Propaganda a framework designed to evaluate and train LLMs on political opinions and bias. We aim to analyze both open-source and closed-source LLMs to understand the political positions and biases expressed in their outputs. Moreover we provide a set of recipes to enforce political positions into the models by creating ad hoc curated datasets and by applying fine tuning techniques. By releasing our work in the open, we hope to foster contributions: https://github.com/mii-llm/propaganda

This framework offers opportunities for expansion in various directions and could become the standard reference for evaluating LLMs on political topics, particularly those that influence public opinion.
posted an update 27 days ago
view post
Post
2860
@ mii-llm with @efederici @mferraretto @FinancialSupport and @DeepMount00 we just released #Propaganda a framework designed to evaluate and train LLMs on political opinions and bias. We aim to analyze both open-source and closed-source LLMs to understand the political positions and biases expressed in their outputs. Moreover we provide a set of recipes to enforce political positions into the models by creating ad hoc curated datasets and by applying fine tuning techniques. By releasing our work in the open, we hope to foster contributions: https://github.com/mii-llm/propaganda

This framework offers opportunities for expansion in various directions and could become the standard reference for evaluating LLMs on political topics, particularly those that influence public opinion.
upvoted an article 27 days ago
published an article 27 days ago
updated a Space about 2 months ago