File size: 5,332 Bytes
d269737
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
---
pipeline_tag: text-to-image
inference: false
license: other
license_name: stabilityai-nc-research-community
license_link: LICENSE
tags:
- tensorrt
- sd3
- sd3-medium
- text-to-image
- onnx
extra_gated_prompt: >-
  By clicking "Agree", you agree to the [License
  Agreement](https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/LICENSE)
  and acknowledge Stability AI's [Privacy
  Policy](https://stability.ai/privacy-policy).
extra_gated_fields:
  Name: text
  Email: text
  Country: country
  Organization or Affiliation: text
  Receive email updates and promotions on Stability AI products, services, and research?:
    type: select
    options:
    - 'Yes'
    - 'No'
  I acknowledge that this model is for non-commercial use only unless I acquire a separate license from Stability AI: checkbox
language:
- en
---

# Stable Diffusion 3 Medium TensorRT
## Introduction

This repository hosts the TensorRT version of **Stable Diffusion 3 Medium** created in collaboration with [NVIDIA](https://huggingface.co/nvidia). The optimized versions give substantial improvements in speed and efficiency.

Stable Diffusion 3 Medium is a fast generative text-to-image model with greatly improved performance in multi-subject prompts, image quality, and spelling abilities.

## Model Details

### Model Description
Stable Diffusion 3 Medium combines a diffusion transformer architecture and flow matching.

- **Developed by:** Stability AI
- **Model type:** MMDiT text-to-image model
- **Model Description:** This is a conversion of the [Stable Diffusion 3 Medium](https://huggingface.co/stabilityai/stable-diffusion-3-medium) model


## Performance using TensorRT 10.1
#### Timings for 50 steps at 1024x1024

| Accelerator | CLIP-G      | CLIP-L       | T5XXL         | MMDiT                 | VAE Decoder         | Total                  |
|-------------|-------------|--------------|---------------|-----------------------|---------------------|------------------------|
| A100        | 11.95 ms    | 5.04 ms      | 21.39 ms      | 5468.17 ms            | 72.25 ms            | 5622.47 ms             |

#### Timings for 30 steps at 1024x1024 with input image conditioning

| Accelerator | VAE Encoder    | CLIP-G      | CLIP-L       | T5XXL         | MMDiT                 | VAE Decoder         | Total          |
|-------------|----------------|-------------|--------------|---------------|-----------------------|---------------------|----------------|
| A100        | 37.04 ms       | 12.07 ms    | 5.07 ms      | 21.49 ms      | 3340.69 ms            | 72.02 ms            | 3531.49 ms     |


## Int8 quantization with [TensorRT Model Optimizer](https://github.com/NVIDIA/TensorRT-Model-Optimizer)
The MMDiT in Stable Diffusion 3 Medium can be further optimized with INT8 quantization using TensorRT Model Optimizer. The estimated end-to-end speedup comparing TensorRT fp16 and TensorRT int8 is 1.2x~1.4x on various NVidia GPUs. The memory saving is about 2x for the int8 MMDiT engine compared with the fp16 counterpart. The image quality can be maintained with minimal to negligible degradation.

## Usage Example
<!-- Finalize the branch and namespace -->
1. Follow the [setup instructions](https://github.com/NVIDIA/TensorRT/blob/release/sd3/demo/Diffusion/README.md) on launching a TensorRT NGC container.
```shell
git clone https://github.com/NVIDIA/TensorRT.git
cd TensorRT
git checkout release/sd3
docker run --rm -it --gpus all -v $PWD:/workspace nvcr.io/nvidia/pytorch:24.05-py3 /bin/bash
```

2. Download the Stable Diffusion 3 Medium TensorRT files from this repo
```shell
git lfs install 
git clone https://huggingface.co/stabilityai/stable-diffusion-3-medium-tensorrt
cd stable-diffusion-3-medium-tensorrt
git lfs pull
cd ..
```

3. Install libraries and requirements
```shell
cd demo/Diffusion
python3 -m pip install --upgrade pip
pip3 install -r requirements.txt
python3 -m pip install --pre --upgrade --extra-index-url https://pypi.nvidia.com tensorrt-cu12
```


4. Perform TensorRT optimized inference:

  - **Stable Diffusion 3 Medium**
        
    Works best for 1024x1024 images. The first invocation produces plan files in --engine-dir specific to the accelerator being run on and are reused for later invocations. 
    ```
    python3 demo_txt2img_sd3.py \
      ""Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"" \
      --version=sd3 \
      --onnx-dir /workspace/sd3-tensorrt/ \
      --engine-dir /workspace/sd3-tensorrt/engine \
      --seed 42 \
      --width 1024 \
      --height 1024 \
      --build-static-batch \
      --use-cuda-graph
    ```

  - **Stable Diffusion 3 Medium with input image conditioning**
        
    Provide an input image conditioning using below. Works best for 1024x1024 but may also work at 512x512.
    ```
    wget https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png -O dog-on-bench.png

    python3 demo_txt2img_sd3.py \
      """dog wearing a sweater and a blue collar""" \
      --version=sd3 \
      --onnx-dir /workspace/sd3-tensorrt/ \
      --engine-dir /workspace/sd3-tensorrt/engine \
      --seed 42 \
      --width 1024 \
      --height 1024 \
      --input-image dog-on-bench.png \
      --build-static-batch \
      --use-cuda-graph
    ```