-
VOID: Video Object and Interaction Deletion
Paper • 2604.02296 • Published • 54 -
OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation
Paper • 2604.18486 • Published • 90 -
WildDet3D: Scaling Promptable 3D Detection in the Wild
Paper • 2604.08626 • Published • 245 -
UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning and World Modeling
Paper • 2604.19734 • Published • 29
ZhengQi Wan
Vanqi
·
AI & ML interests
None yet
Recent Activity
updated a collection about 1 hour ago
From Vision to Motion updated a collection 2 days ago
From Vision to Motion updated a collection 2 days ago
Interesting work but not directly relatedOrganizations
None yet
From Vision to Motion
-
HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning
Paper • 2603.17024 • Published • 109 -
WorldAgents: Can Foundation Image Models be Agents for 3D World Models?
Paper • 2603.19708 • Published • 13 -
MACRO: Advancing Multi-Reference Image Generation with Structured Long-Context Data
Paper • 2603.25319 • Published • 32 -
ArtHOI: Taming Foundation Models for Monocular 4D Reconstruction of Hand-Articulated-Object Interactions
Paper • 2603.25791 • Published • 7
Interesting work but not directly related
-
VOID: Video Object and Interaction Deletion
Paper • 2604.02296 • Published • 54 -
OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation
Paper • 2604.18486 • Published • 90 -
WildDet3D: Scaling Promptable 3D Detection in the Wild
Paper • 2604.08626 • Published • 245 -
UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning and World Modeling
Paper • 2604.19734 • Published • 29
From Vision to Motion
-
HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning
Paper • 2603.17024 • Published • 109 -
WorldAgents: Can Foundation Image Models be Agents for 3D World Models?
Paper • 2603.19708 • Published • 13 -
MACRO: Advancing Multi-Reference Image Generation with Structured Long-Context Data
Paper • 2603.25319 • Published • 32 -
ArtHOI: Taming Foundation Models for Monocular 4D Reconstruction of Hand-Articulated-Object Interactions
Paper • 2603.25791 • Published • 7
models 0
None public yet
datasets 0
None public yet