PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding Paper β’ 2501.16411 β’ Published 7 days ago β’ 17
Running on Zero 1.31k π Chat With Janus-Pro-7B A unified multimodal understanding and generation model.
Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step Paper β’ 2501.13926 β’ Published 11 days ago β’ 33
Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments Paper β’ 2501.10893 β’ Published 16 days ago β’ 23
Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks Paper β’ 2501.11733 β’ Published 14 days ago β’ 27
GameFactory: Creating New Games with Generative Interactive Videos Paper β’ 2501.08325 β’ Published 20 days ago β’ 61
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos Paper β’ 2501.09781 β’ Published 18 days ago β’ 24
Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control Paper β’ 2501.03847 β’ Published 27 days ago β’ 23
VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models Paper β’ 2412.19645 β’ Published Dec 27, 2024 β’ 13
BrushEdit: All-In-One Image Inpainting and Editing Paper β’ 2412.10316 β’ Published Dec 13, 2024 β’ 33
Wonderland: Navigating 3D Scenes from a Single Image Paper β’ 2412.12091 β’ Published Dec 16, 2024 β’ 16
ColorFlow: Retrieval-Augmented Image Sequence Colorization Paper β’ 2412.11815 β’ Published Dec 16, 2024 β’ 26
InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption Paper β’ 2412.09283 β’ Published Dec 12, 2024 β’ 19
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper β’ 2412.10360 β’ Published Dec 13, 2024 β’ 139