5 3 4

Grégoire Mialon

gregmialz

AI & ML interests

Self-supervised learning, Augmented LLMs

Recent Activity

upvoted a paper about 2 months ago

MLGym: A New Framework and Benchmark for Advancing AI Research Agents

liked a Space 9 months ago

Nexusflow/Nexus_Function_Calling_Leaderboard

liked a Space about 1 year ago

gaia-benchmark/leaderboard

View all activity

Organizations

gregmialz's activity

upvoted a paper about 2 months ago

MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Paper • 2502.14499 • Published Feb 20 • 190

liked a Space 9 months ago

Nexus Function Calling Leaderboard

🐠

Visualize model performance on function calling tasks

liked a Space about 1 year ago

388

GAIA Leaderboard

🦾

Submit models for evaluation and view leaderboard results

liked 2 datasets about 1 year ago

gaia-benchmark/GAIA

Updated Feb 13 • 11.5k • 294

m-a-p/Code-Feedback

Viewer • Updated Feb 26, 2024 • 66.4k • 273 • 206

New activity in gaia-benchmark/GAIA about 1 year ago

where is the score function?

#3 opened over 1 year ago by

eyuansu71

upvoted a paper about 1 year ago

OS-Copilot: Towards Generalist Computer Agents with Self-Improvement

Paper • 2402.07456 • Published Feb 12, 2024 • 45

New activity in gaia-benchmark/leaderboard about 1 year ago

Possible future contamination problem

#7 opened about 1 year ago by

supercharge19

reacted to clefourrier's post with 🤗 about 1 year ago

Post

🏅 New top model on the GAIA benchmark!

Called FRIDAY, it's a mysterious new autonomous agent, which got quite good performances on both the public validation set *and* the private test set.
It notably passed 10 points for the val and 5 points for the test set on our hardest questions (level 3): they require to take arbitrarily long sequences of actions, use any number of tools, and access the world in genera! ✨

The GAIA benchmark evaluates next-generation LLMs (LLMs with augmented capabilities due to added tooling, efficient prompting, access to search, etc) and was co authored by @gregmialz @ThomasNLG @ylecun @thomwolf and myself: gaia-benchmark/leaderboard