LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning? Paper • 2503.19990 • Published 13 days ago • 33
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Paper • 2404.07972 • Published Apr 11, 2024 • 48