Coordination Nationale pour l'IA

government

https://www.info.gouv.fr/actualite/france-2030-nomination-du-coordinateur-national-pour-l-intelligence-artificielle

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

mdiazmel updated a dataset about 10 hours ago

fr-gouv-coordination-ia/requests

mdiazmel updated a dataset about 10 hours ago

fr-gouv-coordination-ia/results

BertrandCabotIDRIS updated a dataset 1 day ago

fr-gouv-coordination-ia/results

View all activity

fr-gouv-coordination-ia's activity

mdiazmel

updated 2 datasets about 10 hours ago

fr-gouv-coordination-ia/requests

Preview • Updated about 10 hours ago • 41

fr-gouv-coordination-ia/results

Updated about 10 hours ago • 44

BertrandCabotIDRIS

updated a dataset 1 day ago

fr-gouv-coordination-ia/results

Updated about 10 hours ago • 44

nataliaElv

posted an update 9 days ago

Post

1407

New chapter in the Hugging Face NLP course! 🤗 🚀

We've added a new chapter about the very basics of Argilla to the Hugging Face NLP course. Learn how to set up an Argilla instance, load & annotate datasets, and export them to the Hub.

Any feedback for improvements welcome!

https://huggingface.co/learn/nlp-course/chapter10

mdiazmel

published 2 datasets 9 days ago

fr-gouv-coordination-ia/results

Updated about 10 hours ago • 44

fr-gouv-coordination-ia/requests

Preview • Updated about 10 hours ago • 41

nataliaElv

posted an update 17 days ago

Post

541

Do you want to easily save annotations to a Dataset in the Hub?

In the last version of Argilla (v2.6.0), you can export your data directly from the UI to the Hub.

Check all the changes and update to the latest version: https://github.com/argilla-io/argilla/releases/tag/v2.6.0

nataliaElv

posted an update about 1 month ago

Post

1663

If you are still wondering how the FineWeb2 annotations are done, how to follow the guidelines or how Argilla works, this is your video!

I go through a few samples of the FineWeb2 dataset and classify them based on their educational content. Check it out!

https://www.youtube.com/watch?v=_-ORB4WAVGU

nataliaElv

posted an update about 2 months ago

Post

1293

How do your annotations for FineWeb2 compare to your teammates'?

I started contributing some annotations to the FineWeb2 collaborative annotation sprint and I wanted to know if my labelling trends were similar to those of my teammates.

I did some analysis and I wasn't surprised to see that I'm being a bit harsher on my evaluations than my mates 😂

Do you want to see how your annotations compare to others?
👉 Go to this Gradio space: nataliaElv/fineweb2_compare_my_annotations
✍️ Enter the dataset that you've contributed to and your Hugging Face username.

How were your results?
- Contribute some annotations: data-is-better-together/fineweb-c
- Join your language channel in Rocket chat: HuggingFaceFW/discussion

clefourrier

authored a paper about 2 months ago

Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation

Paper • 2412.03304 • Published Dec 4, 2024 • 17

nataliaElv

posted an update about 2 months ago

Post

1188

We're so close to reaching 100 languages! Can you help us cover the remaining 200? Check if we're still looking for language leads for your language: nataliaElv/language-leads-dashboard

frascuchon

posted an update about 2 months ago

Post

392

🚀 Argilla v2.5.0 is out! 🎉
We’re excited to announce the latest version of Argilla, packed with features to make your data annotation workflows more powerful and seamless. Here’s what’s new:

✨ 1. Argilla Webhooks
With Argilla webhooks, you can:
* Trigger custom workflows
* Seamlessly integrate with external tools
* Build custom event-driven pipelines

🐍 2. Support for Python 3.13 and Pydantic v2
Argilla v2.5.0 now runs on:
* Python 3.13 for enhanced compatibility and speed
* Pydantic v2 for improved performance and type validation

🎨 3. Redesigned Home Page
Argilla's home page has been redesigned to provide a better user experience, showing a new dataset card view, which provides a better overview of the datasets and annotation progress.

📖 Read the full release notes 👉 https://github.com/argilla-io/argilla/releases/tag/v2.5.0)
⬇️ Update now 👉 https://pypi.org/project/argilla)
or use the live demo 👉 argilla/argilla-template-space

nataliaElv

posted an update about 2 months ago

Post

1643

Would you like to get a high-quality dataset to pre-train LLMs in your language? 🌏

At Hugging Face we're preparing a collaborative annotation effort to build an open-source multilingual dataset as part of the Data is Better Together initiative.

Follow the link below, check if your language is listed and sign up to be a Language Lead!

https://forms.gle/s9nGajBh6Pb9G72J6

nataliaElv

posted an update 2 months ago

Post

366

You can now add your Bluesky handle to your Hugging Face profile! 🦋
Have you noticed?

clefourrier

authored 2 papers 7 months ago

The Hallucinations Leaderboard -- An Open Effort to Measure Hallucinations in Large Language Models

Paper • 2404.05904 • Published Apr 8, 2024 • 8

GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 188

clefourrier

posted an update 9 months ago

Post

5647

In a basic chatbots, errors are annoyances. In medical LLMs, errors can have life-threatening consequences 🩸

It's therefore vital to benchmark/follow advances in medical LLMs before even thinking about deployment.

This is why a small research team introduced a medical LLM leaderboard, to get reproducible and comparable results between LLMs, and allow everyone to follow advances in the field.

openlifescienceai/open_medical_llm_leaderboard

Congrats to @aaditya and @pminervini !
Learn more in the blog: https://huggingface.co/blog/leaderboard-medicalllm

clefourrier

posted an update 9 months ago

Post

4630

Contamination free code evaluations with LiveCodeBench! 🖥️

LiveCodeBench is a new leaderboard, which contains:
- complete code evaluations (on code generation, self repair, code execution, tests)
- my favorite feature: problem selection by publication date 📅

This feature means that you can get model scores averaged only on new problems out of the training data. This means... contamination free code evals! 🚀

Check it out!

Blog: https://huggingface.co/blog/leaderboard-livecodebench
Leaderboard: livecodebench/leaderboard

Congrats to @StringChaos @minimario @xu3kev @kingh0730 and @FanjiaYan for the super cool leaderboard!

clefourrier

posted an update 9 months ago

Post

2216

🆕 Evaluate your RL agents - who's best at Atari?🏆

The new RL leaderboard evaluates agents in 87 possible environments (from Atari 🎮 to motion control simulations🚶and more)!

When you submit your model, it's run and evaluated in real time - and the leaderboard displays small videos of the best model's run, which is super fun to watch! ✨

Kudos to @qgallouedec for creating and maintaining the leaderboard!
Let's find out which agent is the best at games! 🚀

open-rl-leaderboard/leaderboard

clefourrier

posted an update 10 months ago

Post

2223

Fun fact about evaluation, part 2!

How much do scores change depending on prompt format choice?

Using different prompts (all present in the literature, from Prompt question? to Question: prompt question?\nChoices: enumeration of all choices\nAnswer: ), we get a score range of...

10 points for a single model!
Keep in mind that we only changed the prompt, not the evaluation subsets, etc.
Again, this confirms that evaluation results reported without their details are basically bullshit.

Prompt format on the x axis, all these evals look at the logprob of either "choice A/choice B..." or "A/B...".

Incidentally, it also changes model rankings - so a "best" model might only be best on one type of prompt...

AI & ML interests

Recent Activity

Team members 11

fr-gouv-coordination-ia's activity