InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published about 20 hours ago • 153
Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images Paper • 2504.08727 • Published 4 days ago • 7
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model Paper • 2504.08685 • Published 4 days ago • 98
OmniSVG: A Unified Scalable Vector Graphics Generation Model Paper • 2504.06263 • Published 7 days ago • 141
EvMic: Event-based Non-contact sound recovery from effective spatial-temporal modeling Paper • 2504.02402 • Published 12 days ago • 5
Articulated Kinematics Distillation from Video Diffusion Models Paper • 2504.01204 • Published 14 days ago • 23
SketchVideo: Sketch-based Video Generation and Editing Paper • 2503.23284 • Published 16 days ago • 22
TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization Paper • 2503.19901 • Published 21 days ago • 35
MeshCraft: Exploring Efficient and Controllable Mesh Generation with Flow-based DiTs Paper • 2503.23022 • Published 17 days ago • 7
SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling Paper • 2503.21732 • Published 19 days ago • 8
Hi3DGen: High-fidelity 3D Geometry Generation from Images via Normal Bridging Paper • 2503.22236 • Published 18 days ago • 11
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks Paper • 2503.21696 • Published 19 days ago • 21
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness Paper • 2503.21755 • Published 19 days ago • 31
TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting Paper • 2503.17032 • Published 25 days ago • 24
NuiScene: Exploring Efficient Generation of Unbounded Outdoor Scenes Paper • 2503.16375 • Published 26 days ago • 9