metadata

license: cc-by-nc-4.0

InstaFlow: 2-Rectified Flow fine-tuned from Stable Diffusion v1.5

2-Rectified Flow is a few-step text-to-image generative model fine-tuned from Stabled Diffusion v1.5.

We use text-conditioned reflow as described in our paper.

Reflow has interesting theoretical properties. You may check this ICLR paper and this arXiv paper.

Images Generated from Random Diffusion DB prompts


Prompt: a renaissance portrait of dwayne johnson, art in the style of rembrandt.


Prompt: a photo of a rabbit head on a grizzly bear body.

Usage

Please refer to the official github repo.

Training

Training pipeline:

Reflow (Stage 1): We train the model using the text-conditioned reflow objective with a batch size of 64 for 70,000 iterations. The model is initialized from the pre-trained SD 1.5 weights. (11.2 A100 GPU days)
Reflow (Stage 2): We continue to train the model using the text-conditioned reflow objective with an increased batch size of 1024 for 25,000 iterations. (64 A100 GPU days)

The final model is 2-Rectified Flow.

Total Training Cost: It takes 75.2 A100 GPU days to get 2-Rectified Flow.

Evaluation Results - Metrics

The following metrics of 2-Rectified Flow are measured on MS COCO 2017 with 5000 images and 25-step Euler solver.

FID-5k = 21.5, CLIP score = 0.315

Evaluation Results - Impact of Guidance Scale

Citation

@article{liu2023insta,
  title={InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation},
  author={Liu, Xingchao and Zhang, Xiwen and Ma, Jianzhu and Peng, Jian and Liu, Qiang},
  journal={arXiv preprint arXiv:2309.06380},
  year={2023}
}