ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities Paper • 2408.04682 • Published Aug 8, 2024 • 18
UI Agent Collection a collection of algorithmic agents for user interfaces/interactions, program synthesis, and robotics • 347 items • Updated 2 days ago • 51