High-Fidelity Two-Step Image Generation via Teacher-Aligned End-to-End Distillation
Abstract
A 2-step image generation model is developed through distillation from an 8-step teacher using distribution-aligned adversarial learning, step-decoupled parameterization, and end-to-end training with iterative regularization.
Few-step diffusion distillation has become increasingly mature for 4-8-step generation, yet pushing further to 2 steps remains challenging. In this work, we introduce Z-Image Turbo++, a high-quality 2-step image generation model distilled from the 8-step Z-Image Turbo teacher. Our method addresses the central bottlenecks of increased task difficulty and limited model capacity in 2-step generation through three simple but effective design choices tailored to this regime. First, we propose Distribution-Aligned Adversarial Learning, which uses teacher-generated images rather than external real images as real samples for GAN training, providing a more attainable and informative adversarial target. Second, we adopt Step-Decoupled Parameterization, assigning independent model parameters to the two denoising steps to better match their distinct capacity demands. Third, we perform End-to-End Training with Iterative Regularization, allowing the first step to receive gradients from final image quality while preserving a meaningful intermediate generation through an explicit step-1 loss. Together, these designs substantially narrow the quality gap between 2-step and 8-step generation in both qualitative and quantitative evaluations, highlighting the potential of carefully tailored distillation strategies for improving the quality-efficiency trade-off in few-step generation.
Community
We introduce Z-Image Turbo++, a high-quality 2-step image generation model distilled from the 8-step Z-Image Turbo teacher. With distribution-aligned adversarial learning, step-decoupled parameterization, and end-to-end training with iterative regularization, Z-Image Turbo++ substantially narrows the quality gap between 2-step and 8-step generation while keeping inference to only two denoising steps.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- TurboTalk: Progressive Distillation for One-Step Audio-Driven Talking Avatar Generation (2026)
- One-Forcing: Towards Stable One-Step Autoregressive Video Generation (2026)
- Noise-Started One-Step Real-World Super-Resolution via LR-Conditioned SplitMeanFlow and GAN Refinement (2026)
- One-Step Distillation of Discrete Diffusion Image Generators via Fixed-Point Iteration (2026)
- Reinforcing Few-step Generators via Reward-Tilted Distribution Matching (2026)
- AAD-1: Asymmetric Adversarial Distillation for One-Step Autoregressive Video Generation (2026)
- Teacher-Feature Drifting: One-Step Diffusion Distillation with Pretrained Diffusion Representations (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Neat paper. Pushing distillation down to just two steps while keeping the quality high is impressive, especially since the jump from 8 steps to 2 is usually where things fall apart. The idea of using the teacher's output as the target for adversarial training instead of external data feels like a really clean way to handle the training distribution.
Do you think the step-decoupled parameterization makes the model significantly harder to train or tune compared to sharing weights across steps?
I made a podcast on it with ResearchPod, it makes it easy to get the key concepts on the go:
https://researchpod.app/episode/76e3cbca-a3eb-4620-83e0-df7d6dff5901
Get this paper in your agent:
hf papers read 2606.12575 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper