ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both Paper • 2605.15198 • Published 4 days ago • 17
Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms Paper • 2604.23775 • Published 22 days ago • 45
SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing Paper • 2604.04911 • Published Apr 6 • 36
Prune as You Generate: Online Rollout Pruning for Faster and Better RLVR Paper • 2603.24840 • Published Mar 25 • 2
QuantVLA: Scale-Calibrated Post-Training Quantization for Vision-Language-Action Models Paper • 2602.20309 • Published Feb 23 • 16 • 4
CubeComposer: Spatio-Temporal Autoregressive 4K 360° Video Generation from Perspective Video Paper • 2603.04291 • Published Mar 4 • 15
view article Article What’s MXFP4? The 4-Bit Secret Powering OpenAI’s GPT‑OSS Models on Modest Hardware RakshitAralimatti • Aug 8, 2025 • 35
QuantVLA: Scale-Calibrated Post-Training Quantization for Vision-Language-Action Models Paper • 2602.20309 • Published Feb 23 • 16
QuantVLA: Scale-Calibrated Post-Training Quantization for Vision-Language-Action Models Paper • 2602.20309 • Published Feb 23 • 16
Improving Data and Reward Design for Scientific Reasoning in Large Language Models Paper • 2602.08321 • Published Feb 9 • 43
TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs Paper • 2512.14698 • Published Dec 16, 2025 • 22
LRQ-DiT: Log-Rotation Post-Training Quantization of Diffusion Transformers for Image and Video Generation Paper • 2508.03485 • Published Aug 5, 2025 • 2
LRQ-DiT: Log-Rotation Post-Training Quantization of Diffusion Transformers for Image and Video Generation Paper • 2508.03485 • Published Aug 5, 2025 • 2
From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model Paper • 2510.19871 • Published Oct 22, 2025 • 30
Video-As-Prompt Collection The model zoo for "Video-As-Prompt: Unified Semantic Control for Video Generation" • 3 items • Updated Oct 27, 2025 • 14