Alessandro Ercolani's picture

Alessandro Ercolani

giux78

·

https://alessandroercolani.webflow.io/

AI & ML interests

NLP, Reinforcement Learning, Semantics, Computational Neuroscience

Recent Activity

reacted to their post with 👀 3 days ago

LLAMA4 release highlight the importance of political and social bias. According to their own evaluation described in the release blog post: - Refusals on contentious prompts dropped from 7% (hashtag#LLAMA 3.3) to under 2% - Unequal response refusals are now under 1% - Political lean bias is said to be halved compared to hashtag#LLaMA 3.3 and comparable to Grok However, we @efederici @mferraretto @FinancialSupport and I released some weeks ago an independent open source benchmark called Propaganda to measure political bias in LLMs: https://github.com/mii-llm/propaganda In the chart below, we evaluated multiple leading models on the basis of ratings across a range of prompts designed to expose ideological leanings. Despite Meta’s stated neutrality goals, LLAMA4 ranks at the very top in terms of total ratings aligned with a clear ideological bias. The models were tested on their ability to respond even-handedly to politically sensitive prompts. LLaMA 4 scored even higher than models known for strong alignment policies like GPT-4o. LLMs may be refusing less, but they still show bias through content framing. This suggests that refusal rates alone are not a sufficient measure of ideological bias. Relying solely on internal evaluations from AI labs also raises concerns about transparency and objectivity.

updated a dataset 4 days ago

mii-llm/requests

posted an update 5 days ago

LLAMA4 release highlight the importance of political and social bias. According to their own evaluation described in the release blog post: - Refusals on contentious prompts dropped from 7% (hashtag#LLAMA 3.3) to under 2% - Unequal response refusals are now under 1% - Political lean bias is said to be halved compared to hashtag#LLaMA 3.3 and comparable to Grok However, we @efederici @mferraretto @FinancialSupport and I released some weeks ago an independent open source benchmark called Propaganda to measure political bias in LLMs: https://github.com/mii-llm/propaganda In the chart below, we evaluated multiple leading models on the basis of ratings across a range of prompts designed to expose ideological leanings. Despite Meta’s stated neutrality goals, LLAMA4 ranks at the very top in terms of total ratings aligned with a clear ideological bias. The models were tested on their ability to respond even-handedly to politically sensitive prompts. LLaMA 4 scored even higher than models known for strong alignment policies like GPT-4o. LLMs may be refusing less, but they still show bias through content framing. This suggests that refusal rates alone are not a sufficient measure of ideological bias. Relying solely on internal evaluations from AI labs also raises concerns about transparency and objectivity.

View all activity

Organizations

giux78's activity

New activity in mii-llm/pinocchio-ita-leaderboard 6 months ago

The leaderboard is down...

#1 opened 6 months ago by

New activity in mii-llm/pinocchio 7 months ago

Upload test_medicina in the same format

#2 opened 7 months ago by

New activity in occiglot/occiglot-fineweb-v0.5 10 months ago

Problem with dataset = load_dataset('occiglot/occiglot-fineweb-v0.5', data_dir='it', verification_mode="no_checks")

#12 opened 10 months ago by

New activity in mii-llm/open_ita_llm_leaderboard 11 months ago

Update app.py

#13 opened 11 months ago by

Update app.py

#12 opened 11 months ago by

New activity in mii-llm/open_ita_llm_leaderboard 12 months ago

Update leaderboard_general.csv

#10 opened 12 months ago by

New activity in gorilla-llm/Berkeley-Function-Calling-Leaderboard 12 months ago

Problem with the viewer

#10 opened 12 months ago by

New activity in meta-llama/Meta-Llama-3-8B 12 months ago

Access Problems

#45 opened 12 months ago by

New activity in gorilla-llm/APIBench 12 months ago

Dataset is not loading

#2 opened about 1 year ago by

New activity in giux78/gemma-2b-sft-ita about 1 year ago

Information on the model

#1 opened about 1 year ago by

New activity in mii-llm/open_ita_llm_leaderboard about 1 year ago

Upload app.py

#8 opened about 1 year ago by

What is `m_mmul` benchmark?

#7 opened about 1 year ago by

New activity in mii-community/UsenetArchiveIT-conversations about 1 year ago

Upload folder using huggingface_hub

#1 opened about 1 year ago by

New activity in mii-llm/open_ita_llm_leaderboard about 1 year ago

Upload app.py

#3 opened about 1 year ago by

Upload 2 files

#2 opened about 1 year ago by

New activity in alexandrainst/m_mmlu about 1 year ago

Data corrupter

#4 opened about 1 year ago by

Data corrupted

#3 opened about 1 year ago by