Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
syed abuthahir
developerabu
1
1
12
Follow
Adell1982's profile picture
Handyfff's profile picture
2 followers
ยท
18 following
writerabu
abuvanth
AI & ML interests
None yet
Recent Activity
reacted
to
ginigen-ai
's
post
with ๐ฅ
1 day ago
๐ณ The RoboCasa Kitchen Leaderboard What does it take for a robot to handle kitchen chores the way a person does? It has to see (Vision), understand instructions (Language), and actually act (Action) โ and VLA (Vision-Language-Action) models are emerging as the answer. They're the bridge between large multimodal models and real-world embodied control. RoboCasa Kitchen is a leading robot-learning benchmark in which a single-arm robot (Franka Panda) performs 24 atomic manipulation tasks โ picking up cups and bowls, opening drawers and doors, turning faucets, pressing buttons, and more โ inside a photorealistic simulated kitchen. Because the layout and object placement are randomized every episode, it tests genuine generalization rather than memorized motions. The score (success rate, SR) is the average fraction of the 24 tasks completed as instructed, measured over multiple seeds so results aren't down to luck. The catch: this benchmark has no official leaderboard, and protocols (number of demonstrations, evaluation setup) differ from paper to paper, leaving scores scattered. Lining the numbers up naively quickly turns into an apples-to-oranges comparison. This leaderboard fixes that by collecting published scores with their sources and comparing only what is genuinely comparable. It's split into three tables: ๐ Kitchen 24-task (matched) โ head-to-head under identical conditions (per the RLDX-1 Technical Report). This is the core ranking you can actually trust. โ Other protocols โ self-reported under different setups (e.g. fewer demos). Not directly comparable, so kept separate. ๐ค GR1-Tabletop โ a different, humanoid-based variant suite, separated to avoid confusion. Any researcher can submit their own model's score directly, and submissions are reviewed before they appear on the board. Every number links to its source paper, so you can verify it yourself. ๐ https://huggingface.co/spaces/ginigen-ai/robocasa-kitchen-leaderboard
reacted
to
danielhanchen
's
post
with ๐ค
7 days ago
1-bit GLM-5.2 GGUF vs. Claude 4.8 Opus vs. GPT-5.5 We gave 3 models the same prompt and compared one-shot outputs. The 1-bit GLM-5.2 GGUF ran locally on a Mac Studio M3 Ultra with 256GB RAM at ~21.6 tok/s. Which output do you like best? GGUF: https://huggingface.co/unsloth/GLM-5.2-GGUF
liked
a model
10 days ago
PatnaikAshish/kokoclone
View all activity
Organizations
None yet
developerabu
's activity
All
Models
Datasets
Spaces
Buckets
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
upvoted
a
collection
4 months ago
Qwen3.5
Collection
21 items
โข
Updated
Mar 9
โข
1.7k