Submitted by zhoutianyi 36 CoSTAast: Cost-Sensitive Toolpath Agent for Multi-turn Image Editing · 4 authors 7
Submitted by sinwang 28 World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning · 7 authors 5
Submitted by agwmon 27 Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models · 5 authors 1
Submitted by Owen777 22 CoRe^2: Collect, Reflect and Refine to Generate Better and Faster · 7 authors 3
Submitted by LucasFang 21 GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing · 12 authors 1
Submitted by wondervictor 15 GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding · 10 authors 1
Submitted by ChenyangLyu 14 New Trends for Modern Machine Translation with Large Reasoning Models · 6 authors 1
Submitted by wenhu 10 VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search · 7 authors 1
Submitted by yyf86 8 DiT-Air: Revisiting the Efficiency of Diffusion Model Architecture Design in Text to Image Generation · 9 authors 1
Submitted by VityaVitalich 8 Do I look like a `cat.n.01` to you? A Taxonomy Image Generation Benchmark · 6 authors 1
Submitted by akhaliq 8 Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k · 32 authors 1
Submitted by EthanTaylor 6 4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models · 8 authors 1
Submitted by sayakpaul 6 SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation · 9 authors 1
Submitted by akhaliq 5 Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond · 14 authors 1
Submitted by BestWishYsh 5 CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance · 10 authors 1
Submitted by allisonandreyev 5 Quantization for OpenAI's Whisper Models: A Comparative Analysis · 1 authors 1
Submitted by akhaliq 4 R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization · 12 authors 1
Submitted by hp-l33 4 Autoregressive Image Generation with Randomized Parallel Decoding · 4 authors 1
Submitted by hkchengrex 3 The Curse of Conditions: Analyzing and Improving Optimal Transport for Conditional Flow-Based Generation · 2 authors 1
Submitted by Weiyun1025 2 VisualPRM: An Effective Process Reward Model for Multimodal Reasoning · 15 authors 1
Submitted by imranraad 2 "Silent Is Not Actually Silent": An Investigation of Toxicity on Bug Report Discussion · 2 authors 1
Submitted by chenblin26 1 ConsisLoRA: Enhancing Content and Style Consistency for LoRA-based Style Transfer · 6 authors 1
Submitted by Nikolai10 1 PerCoV2: Improved Ultra-Low Bit-Rate Perceptual Image Compression with Implicit Hierarchical Masked Image Modeling · 6 authors 1
Submitted by AhmadMustafa - On the Limitations of Vision-Language Models in Understanding Image Transforms · 3 authors 1