OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions Paper • 2602.05843 • Published 4 days ago • 51
HER: Human-like Reasoning and Reinforcement Learning for LLM Role-playing Paper • 2601.21459 • Published 12 days ago • 9
SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training Paper • 2602.03411 • Published 6 days ago • 35
SWE-World: Building Software Engineering Agents in Docker-Free Environments Paper • 2602.03419 • Published 6 days ago • 39
Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models Paper • 2601.22060 • Published 11 days ago • 149
Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models Paper • 2602.02185 • Published 7 days ago • 124
MMDeepResearch-Bench: A Benchmark for Multimodal Deep Research Agents Paper • 2601.12346 • Published 22 days ago • 49
The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning Paper • 2601.06002 • Published Jan 9 • 52
UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision Paper • 2601.03193 • Published Jan 6 • 47
UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision Paper • 2601.03193 • Published Jan 6 • 47
OPV: Outcome-based Process Verifier for Efficient Long Chain-of-Thought Verification Paper • 2512.10756 • Published Dec 11, 2025 • 35
Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving Paper • 2512.10739 • Published Dec 11, 2025 • 47
Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction Paper • 2512.04987 • Published Dec 4, 2025 • 80
T-Eval: Evaluating the Tool Utilization Capability Step by Step Paper • 2312.14033 • Published Dec 21, 2023 • 2
LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding Paper • 2312.14074 • Published Dec 21, 2023
Point-DETR3D: Leveraging Imagery Data with Spatial Point Prior for Weakly Semi-supervised 3D Object Detection Paper • 2403.15317 • Published Mar 22, 2024 • 1
PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition Paper • 2403.17695 • Published Mar 26, 2024 • 1
Are We on the Right Way for Evaluating Large Vision-Language Models? Paper • 2403.20330 • Published Mar 29, 2024 • 6
DETRDistill: A Universal Knowledge Distillation Framework for DETR-families Paper • 2211.10156 • Published Nov 17, 2022
ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models in Multi-Hop Tool Use Paper • 2501.02506 • Published Jan 5, 2025 • 10