VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation
Abstract
We present a general strategy to aligning visual generation models -- both image and video generation -- with human preference. To start with, we build VisionReward -- a fine-grained and multi-dimensional reward model. We decompose human preferences in images and videos into multiple dimensions, each represented by a series of judgment questions, linearly weighted and summed to an interpretable and accurate score. To address the challenges of video quality assessment, we systematically analyze various dynamic features of videos, which helps VisionReward surpass VideoScore by 17.2% and achieve top performance for video preference prediction. Based on VisionReward, we develop a multi-objective preference learning algorithm that effectively addresses the issue of confounding factors within preference data. Our approach significantly outperforms existing image and video scoring methods on both machine metrics and human evaluation. All code and datasets are provided at https://github.com/THUDM/VisionReward.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- FineVQ: Fine-Grained User Generated Content Video Quality Assessment (2024)
- VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models (2024)
- VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection (2024)
- LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment (2024)
- OnlineVPO: Align Video Diffusion Model with Online Video-Centric Preference Optimization (2024)
- Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception (2024)
- MotiF: Making Text Count in Image Animation with Motion Focal Loss (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper