Reward-based Noise Optimization for 1-step t2i models
A VLM-based message decoder that is trained via GRPO
Generate responses in a chat with Qwen, a helpful assistant