Qwen-Image-Agent: Bridging the Context Gap in Real-World Image Generation Paper • 2606.26907 • Published 5 days ago • 46
Wan-Streamer v0.1: End-to-end Real-time Interactive Foundation Models Paper • 2606.25041 • Published 7 days ago • 103
JoyAI-VL-Interaction: Real-Time Vision-Language Interaction Intelligence Paper • 2606.14777 • Published 20 days ago • 206
UniDDT: Unifying Multimodal Understanding and Generation with Decoupled Diffusion Transformer Paper • 2606.16255 • Published 15 days ago • 14
Qwen-RobotWorld Technical Report: Unifying Embodied World Modeling through Language-Conditioned Video Generation Paper • 2606.17030 • Published 15 days ago • 32
Beyond Scalar Rewards by Internalizing Reasoning into Score Distributions Paper • 2606.09076 • Published 22 days ago • 63
Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models Paper • 2606.11025 • Published 21 days ago • 41
SANA-Streaming: Real-time Streaming Video Editing with Hybrid Diffusion Transformer Paper • 2605.30409 • Published May 28 • 41
Representation Forcing for Bottleneck-Free Unified Multimodal Models Paper • 2605.31604 • Published May 29 • 63
Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments Paper • 2605.30280 • Published May 28 • 146
Geo-Align: Video Generation Alignment via Metric Geometry Reward Paper • 2605.23903 • Published May 22 • 10
Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models Paper • 2605.21573 • Published May 20 • 111
SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer Paper • 2605.15178 • Published May 14 • 91