From Sparse to Dense: Multi-View GRPO for Flow Models via Augmented Condition Space
Abstract
Multi-View GRPO enhances text-to-image flow model alignment by expanding condition space for richer reward mapping and improved sample relationship exploration.
Group Relative Policy Optimization (GRPO) has emerged as a powerful framework for preference alignment in text-to-image (T2I) flow models. However, we observe that the standard paradigm where evaluating a group of generated samples against a single condition suffers from insufficient exploration of inter-sample relationships, constraining both alignment efficacy and performance ceilings. To address this sparse single-view evaluation scheme, we propose Multi-View GRPO (MV-GRPO), a novel approach that enhances relationship exploration by augmenting the condition space to create a dense multi-view reward mapping. Specifically, for a group of samples generated from one prompt, MV-GRPO leverages a flexible Condition Enhancer to generate semantically adjacent yet diverse captions. These captions enable multi-view advantage re-estimation, capturing diverse semantic attributes and providing richer optimization signals. By deriving the probability distribution of the original samples conditioned on these new captions, we can incorporate them into the training process without costly sample regeneration. Extensive experiments demonstrate that MV-GRPO achieves superior alignment performance over state-of-the-art methods.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- DenseGRPO: From Sparse to Dense Reward for Flow Matching Model Alignment (2026)
- Know Your Step: Faster and Better Alignment for Flow Matching Models via Step-aware Advantages (2026)
- HyperAlign: Hypernetwork for Efficient Test-Time Alignment of Diffusion Models (2026)
- Advances in GRPO for Generation Models: A Survey (2026)
- Euphonium: Steering Video Flow Matching via Process Reward Gradient Guided Stochastic Dynamics (2026)
- PromptRL: Prompt Matters in RL for Flow-Based Image Generation (2026)
- Towards Unified Multimodal Interleaved Generation via Group Relative Policy Optimization (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper