-
Exploring the Evolution of Physics Cognition in Video Generation: A Survey
Paper • 2503.21765 • Published • 11 -
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness
Paper • 2503.21755 • Published • 31 -
Wan: Open and Advanced Large-Scale Video Generative Models
Paper • 2503.20314 • Published • 47 -
Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT
Paper • 2502.06782 • Published • 14
Collections
Discover the best community collections!
Collections including paper arxiv:2503.21755
-
Reinforcement Learning: An Overview
Paper • 2412.05265 • Published • 7 -
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
Paper • 2411.01156 • Published • 6 -
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness
Paper • 2503.21755 • Published • 31 -
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 126
-
GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation
Paper • 2411.18499 • Published • 18 -
VLSBench: Unveiling Visual Leakage in Multimodal Safety
Paper • 2411.19939 • Published • 10 -
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?
Paper • 2412.02611 • Published • 24 -
U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs
Paper • 2412.03205 • Published • 16
-
255
VBench Leaderboard
📊Upload model data and get detailed evaluation scores
-
VBench: Comprehensive Benchmark Suite for Video Generative Models
Paper • 2311.17982 • Published • 9 -
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models
Paper • 2411.13503 • Published • 35 -
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness
Paper • 2503.21755 • Published • 31
-
Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model
Paper • 2407.07053 • Published • 47 -
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
Paper • 2407.12772 • Published • 36 -
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
Paper • 2407.11691 • Published • 14 -
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Paper • 2408.02718 • Published • 62
-
GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation
Paper • 2312.04557 • Published • 13 -
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models
Paper • 2312.04410 • Published • 15 -
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
Paper • 2312.04461 • Published • 62 -
Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively
Paper • 2401.02955 • Published • 23