CoffeeBench: Benchmarking Long-Horizon LLM Agents in Heterogeneous Multi-Agent Economies Paper • 2606.16613 • Published 14 days ago • 8
Confidence-Aware Tool Orchestration for Robust Video Understanding Paper • 2606.26904 • Published 4 days ago • 9
JetSpec: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting Paper • 2606.18394 • Published 4 days ago • 31
Lite Any Stereo V2: Faster and Stronger Efficient Zero-Shot Stereo Matching Paper • 2606.24457 • Published 6 days ago • 3
Look Light, Think Heavy: What Multimodal Chain-of-Thought Reasoning Can and Cannot Do Paper • 2606.22565 • Published 8 days ago • 8
The Hitchhiker's Guide to Agentic AI: From Foundations to Systems Paper • 2606.24937 • Published 7 days ago • 14
QG-MIL: A Gated Transformer Aggregator for Domain-Agnostic Multiple Instance Learning in Medical Imaging Paper • 2606.20027 • Published 11 days ago • 2
LingxiDiagBench: A Multi-Agent Framework for Benchmarking LLMs in Chinese Psychiatric Consultation and Diagnosis Paper • 2602.09379 • Published 18 days ago • 23
Qwen-AgentWorld: Language World Models for General Agents Paper • 2606.24597 • Published 6 days ago • 137
Deep Research in Physical Sciences: A Multi-Agent Framework and Comprehensive Benchmark Paper • 2606.18648 • Published 12 days ago • 15
PlanBench-XL: Evaluating Long-Horizon Planning of LLM Tool-Use Agents in Large-Scale Tool Ecosystems Paper • 2606.22388 • Published 8 days ago • 95
PerceptionDLM: Parallel Region Perception with Multimodal Diffusion Language Models Paper • 2606.19534 • Published 12 days ago • 63