onnxruntime
/

sd-turbo

@@ -2,8 +2,8 @@
 pipeline_tag: text-to-image
 license: other
 license_name: sai-nc-community
-license_link: https://huggingface.co/stabilityai/sdxl-turbo/blob/main/LICENSE.TXT
-base_model: stabilityai/sdxl-turbo
 language:
   - en
 tags:
@@ -14,15 +14,15 @@ tags:
   - text-to-image
 ---
-# Stable Diffusion XL Turbo for ONNX Runtime CUDA
 ## Introduction
-This repository hosts the optimized onnx models of **SDXL Turbo** to accelerate inference with ONNX Runtime CUDA execution provider for Nvidia GPUs. It cannot run in other providers like CPU or DirectML.
 The models are generated by [Olive](https://github.com/microsoft/Olive/tree/main/examples/stable_diffusion) with command like the following:
 ```
-python stable_diffusion_xl.py --provider cuda --model_id stabilityai/sdxl-turbo --optimize --use_fp16_fixed_vae
 ```
 See the [usage instructions](#usage-example) for how to run the SDXL pipeline with the ONNX files hosted in this repository.
@@ -32,9 +32,24 @@ See the [usage instructions](#usage-example) for how to run the SDXL pipeline wi
 - **Developed by:** Stability AI
 - **Model type:** Diffusion-based text-to-image generative model
 - **License:** [STABILITY AI NON-COMMERCIAL RESEARCH COMMUNITY LICENSE](https://huggingface.co/stabilityai/sd-turbo/blob/main/LICENSE)
-- **Model Description:** This is a conversion of the [SDXL-Turbo](https://huggingface.co/stabilityai/sdxl-turbo) model for [ONNX Runtime](https://github.com/microsoft/onnxruntime) inference with CUDA execution provider.
-The VAE decoder is converted from [sdxl-vae-fp16-fix](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix). There are slight discrepancies between its output and that of the original VAE, but the decoded images should be [close enough for most purposes](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix/discussions/7#64c5c0f8e2e5c94bd04eaa80).
 ## Usage Example
@@ -48,10 +63,10 @@ git clone https://github.com/microsoft/onnxruntime
 cd onnxruntime
 ```
-2. Download the SDXL ONNX files from this repo
 ```shell
 git lfs install
-git clone https://huggingface.co/tlwu/sdxl-turbo-onnxruntime
 ```
 3. Launch the docker
@@ -84,8 +99,8 @@ python3 -m pip install --upgrade polygraphy onnx-graphsurgeon --extra-index-url
 6. Perform ONNX Runtime optimized inference
 ```shell
-python3 demo_txt2img_xl.py \
   "starry night over Golden Gate Bridge by van gogh" \
-  --version xl-turbo   \
-  --engine-dir /workspace/sdxl-turbo-onnxruntime
 ```

 pipeline_tag: text-to-image
 license: other
 license_name: sai-nc-community
+license_link: https://huggingface.co/stabilityai/sd-turbo/blob/main/LICENSE.TXT
+base_model: stabilityai/sd-turbo
 language:
   - en
 tags:
   - text-to-image
 ---
+# Stable Diffusion Turbo for ONNX Runtime CUDA
 ## Introduction
+This repository hosts the optimized ONNX models of **SD Turbo** to accelerate inference with ONNX Runtime CUDA execution provider for Nvidia GPUs. It cannot run in other providers like CPU and DirectML.
 The models are generated by [Olive](https://github.com/microsoft/Olive/tree/main/examples/stable_diffusion) with command like the following:
 ```
+python stable_diffusion.py --provider cuda --model_id stabilityai/sd-turbo --optimize --use_fp16_fixed_vae
 ```
 See the [usage instructions](#usage-example) for how to run the SDXL pipeline with the ONNX files hosted in this repository.
 - **Developed by:** Stability AI
 - **Model type:** Diffusion-based text-to-image generative model
 - **License:** [STABILITY AI NON-COMMERCIAL RESEARCH COMMUNITY LICENSE](https://huggingface.co/stabilityai/sd-turbo/blob/main/LICENSE)
+- **Model Description:** This is a conversion of the [SD-Turbo](https://huggingface.co/stabilityai/sd-turbo) model for [ONNX Runtime](https://github.com/microsoft/onnxruntime) inference with CUDA execution provider.
+## Performance
+#### Latency
+Below is average latency of generating an image of size 512x512 using NVIDIA A100-SXM4-80GB GPU:
+| Engine      | Batch Size | Steps | ONNX Runtime CUDA |
+|-------------|------------|------ | ----------------- |
+| Static      | 1          |   1   | 38.2 ms           |
+| Static      | 4          |   1   | 120.2 ms          |
+| Static      | 1          |   4   | 68.7 ms           |
+| Static      | 4          |   4   | 192.6 ms          |
+Static means the engine is built for the given batch size and image size combination, and CUDA graph is used to speed up.
 ## Usage Example
 cd onnxruntime
 ```
+2. Download the ONNX files from this repo
 ```shell
 git lfs install
+git clone https://huggingface.co/tlwu/sd-turbo-onnxruntime
 ```
 3. Launch the docker
 6. Perform ONNX Runtime optimized inference
 ```shell
+python3 demo_txt2img.py \
   "starry night over Golden Gate Bridge by van gogh" \
+  --version sd-turbo   \
+  --engine-dir /workspace/sd-turbo-onnxruntime
 ```