AI & ML interests

None defined yet.

Recent Activity

mdiazmelย  updated a dataset about 10 hours ago
fr-gouv-coordination-ia/requests
mdiazmelย  updated a dataset about 10 hours ago
fr-gouv-coordination-ia/results
BertrandCabotIDRISย  updated a dataset 1 day ago
fr-gouv-coordination-ia/results
View all activity

fr-gouv-coordination-ia's activity

nataliaElvย 
posted an update 9 days ago
view post
Post
1407
New chapter in the Hugging Face NLP course! ๐Ÿค— ๐Ÿš€

We've added a new chapter about the very basics of Argilla to the Hugging Face NLP course. Learn how to set up an Argilla instance, load & annotate datasets, and export them to the Hub.ย 

Any feedback for improvements welcome!

https://huggingface.co/learn/nlp-course/chapter10
nataliaElvย 
posted an update 17 days ago
view post
Post
541
Do you want to easily save annotations to a Dataset in the Hub?

In the last version of Argilla (v2.6.0), you can export your data directly from the UI to the Hub.

Check all the changes and update to the latest version: https://github.com/argilla-io/argilla/releases/tag/v2.6.0
nataliaElvย 
posted an update about 1 month ago
view post
Post
1663
If you are still wondering how the FineWeb2 annotations are done, how to follow the guidelines or how Argilla works, this is your video!

I go through a few samples of the FineWeb2 dataset and classify them based on their educational content. Check it out!

https://www.youtube.com/watch?v=_-ORB4WAVGU
nataliaElvย 
posted an update about 2 months ago
view post
Post
1293
How do your annotations for FineWeb2 compare to your teammates'?

I started contributing some annotations to the FineWeb2 collaborative annotation sprint and I wanted to know if my labelling trends were similar to those of my teammates.

I did some analysis and I wasn't surprised to see that I'm being a bit harsher on my evaluations than my mates ๐Ÿ˜‚


Do you want to see how your annotations compare to others?
๐Ÿ‘‰ Go to this Gradio space: nataliaElv/fineweb2_compare_my_annotations
โœ๏ธ Enter the dataset that you've contributed to and your Hugging Face username.

How were your results?
- Contribute some annotations: data-is-better-together/fineweb-c
- Join your language channel in Rocket chat: HuggingFaceFW/discussion
nataliaElvย 
posted an update about 2 months ago
view post
Post
1188
We're so close to reaching 100 languages! Can you help us cover the remaining 200? Check if we're still looking for language leads for your language: nataliaElv/language-leads-dashboard
frascuchonย 
posted an update about 2 months ago
view post
Post
392
๐Ÿš€ Argilla v2.5.0 is out! ๐ŸŽ‰
Weโ€™re excited to announce the latest version of Argilla, packed with features to make your data annotation workflows more powerful and seamless. Hereโ€™s whatโ€™s new:

โœจ 1. Argilla Webhooks
With Argilla webhooks, you can:
* Trigger custom workflows
* Seamlessly integrate with external tools
* Build custom event-driven pipelines

๐Ÿ 2. Support for Python 3.13 and Pydantic v2
Argilla v2.5.0 now runs on:
* Python 3.13 for enhanced compatibility and speed
* Pydantic v2 for improved performance and type validation

๐ŸŽจ 3. Redesigned Home Page
Argilla's home page has been redesigned to provide a better user experience, showing a newโ€จdataset card view, which provides a better overview of the datasets and annotation progress.

๐Ÿ“– Read the full release notes ๐Ÿ‘‰ https://github.com/argilla-io/argilla/releases/tag/v2.5.0)
โฌ‡๏ธ Update now ๐Ÿ‘‰ https://pypi.org/project/argilla)
or use the live demo ๐Ÿ‘‰ argilla/argilla-template-space
nataliaElvย 
posted an update about 2 months ago
view post
Post
1643
Would you like to get a high-quality dataset to pre-train LLMs in your language? ๐ŸŒ

At Hugging Face we're preparing a collaborative annotation effort to build an open-source multilingual dataset as part of the Data is Better Together initiative.

Follow the link below, check if your language is listed and sign up to be a Language Lead!

https://forms.gle/s9nGajBh6Pb9G72J6
nataliaElvย 
posted an update 2 months ago
view post
Post
366
You can now add your Bluesky handle to your Hugging Face profile! ๐Ÿฆ‹
Have you noticed?
clefourrierย 
posted an update 9 months ago
view post
Post
5647
In a basic chatbots, errors are annoyances. In medical LLMs, errors can have life-threatening consequences ๐Ÿฉธ

It's therefore vital to benchmark/follow advances in medical LLMs before even thinking about deployment.

This is why a small research team introduced a medical LLM leaderboard, to get reproducible and comparable results between LLMs, and allow everyone to follow advances in the field.

openlifescienceai/open_medical_llm_leaderboard

Congrats to @aaditya and @pminervini !
Learn more in the blog: https://huggingface.co/blog/leaderboard-medicalllm
clefourrierย 
posted an update 9 months ago
view post
Post
4630
Contamination free code evaluations with LiveCodeBench! ๐Ÿ–ฅ๏ธ

LiveCodeBench is a new leaderboard, which contains:
- complete code evaluations (on code generation, self repair, code execution, tests)
- my favorite feature: problem selection by publication date ๐Ÿ“…

This feature means that you can get model scores averaged only on new problems out of the training data. This means... contamination free code evals! ๐Ÿš€

Check it out!

Blog: https://huggingface.co/blog/leaderboard-livecodebench
Leaderboard: livecodebench/leaderboard

Congrats to @StringChaos @minimario @xu3kev @kingh0730 and @FanjiaYan for the super cool leaderboard!
clefourrierย 
posted an update 9 months ago
view post
Post
2216
๐Ÿ†• Evaluate your RL agents - who's best at Atari?๐Ÿ†

The new RL leaderboard evaluates agents in 87 possible environments (from Atari ๐ŸŽฎ to motion control simulations๐Ÿšถand more)!

When you submit your model, it's run and evaluated in real time - and the leaderboard displays small videos of the best model's run, which is super fun to watch! โœจ

Kudos to @qgallouedec for creating and maintaining the leaderboard!
Let's find out which agent is the best at games! ๐Ÿš€

open-rl-leaderboard/leaderboard
clefourrierย 
posted an update 10 months ago
view post
Post
2223
Fun fact about evaluation, part 2!

How much do scores change depending on prompt format choice?

Using different prompts (all present in the literature, from Prompt question? to Question: prompt question?\nChoices: enumeration of all choices\nAnswer: ), we get a score range of...

10 points for a single model!
Keep in mind that we only changed the prompt, not the evaluation subsets, etc.
Again, this confirms that evaluation results reported without their details are basically bullshit.

Prompt format on the x axis, all these evals look at the logprob of either "choice A/choice B..." or "A/B...".

Incidentally, it also changes model rankings - so a "best" model might only be best on one type of prompt...