Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
1
1
Toby Drane
tobydrane
Follow
HennersBro98's profile picture
kaikaidai's profile picture
2 followers
ยท
0 following
AI & ML interests
None yet
Recent Activity
replied
to
kaikaidai
's
post
20 days ago
๐ Early results on the 8B evaluation model we've been training... @NinaCalvi wrote about the progress we've made this quarter towards training the best 'LLM-as-a-judge' evaluator. We've significantly improved against the baseline and are approaching state-of-the-art evaluation performance with an 8B model. Next up: training Llama-3.1-70B ๐ Here's the full article: https://www.atla-ai.com/post/evaluating-the-evaluator
liked
a Space
about 1 month ago
AtlaAI/judge-arena
updated
a Space
about 2 months ago
AtlaAI/judge-arena
View all activity
Articles
Judge Arena: Benchmarking LLMs as Evaluators
Nov 19
โข
52
Organizations
models
None public yet
datasets
None public yet