Submitted by zhihou 40 Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy · 11 authors 1
Submitted by KennyUTC 27 LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning? · 9 authors 1
Submitted by akhaliq 22 Open Deep Search: Democratizing Search with Open-source Reasoning Agents · 12 authors 2
Submitted by phillipinseoul 19 Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models · 4 authors 1
Submitted by msj9817 14 GenHancer: Imperfect Generative Models are Secretly Strong Vision-Centric Enhancers · 6 authors 1
Submitted by Awiny 11 BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation · 9 authors 2
Submitted by yilunzhao 7 MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search · 4 authors 1
Submitted by Concyclics 7 LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy Preservation · 7 authors 1
Submitted by aejion 5 AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset · 6 authors 1
Submitted by Ningyu 3 ADS-Edit: A Multimodal Knowledge Editing Dataset for Autonomous Driving Systems · 7 authors 1
Submitted by hahahawu 3 Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging · 10 authors 1
Submitted by Awiny 3 Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models · 5 authors 1
Submitted by r0nn13 3 Image as an IMU: Estimating Camera Motion from a Single Motion-Blurred Image · 2 authors 1
Submitted by ya-mehdi 3 Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMs · 8 authors 1
Submitted by akhaliq 2 Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals · 7 authors 1
Submitted by johanobandoc 2 Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training · 10 authors 1
Submitted by Jarvis1111 1 UniHDSA: A Unified Relation Prediction Approach for Hierarchical Document Structure Analysis · 3 authors 1
Submitted by aadarsh-ram 1 RONA: Pragmatically Diverse Image Captioning with Coherence Relations · 3 authors 1
Submitted by SteveZeyuZhang - PathoHR: Breast Cancer Survival Prediction on High-Resolution Pathological Images · 10 authors 1