nota-ai
/

bk-sdm-tiny

@@ -80,14 +80,22 @@ diffusers           0.15.0
 ```
 ## Compression Method
 ### U-Net Architecture
-We removed several residual and attention blocks from the 0.86B-parameter U-Net in the 1.04B-param SDM-v1.4, and our compressed models are summarized as follows.
-- 0.76B-param **BK-SDM-Base** (0.58B-param U-Net): obtained with ① fewer blocks in outer stages.
-- 0.66B-param **BK-SDM-Small** (0.49B-param U-Net): obtained with ① and ② mid-stage removal.
-- 0.50B-param **BK-SDM-Tiny** (0.33B-param U-Net): obtained with ①, ②, and ③ further inner-stage removal.
 ### Distillation Pretraining
@@ -95,7 +103,7 @@ The compact U-Net was trained to mimic the behavior of the original U-Net. We le
 <center>
-    <img alt="U-Net architectures and KD-based pretraining" img src="https://huggingface.co/spaces/nota-ai/compressed-stable-diffusion/resolve/e6fb31631f0b2948cf6ec54006ea050d6c83e940/docs/fig_model.png" width="100%">
 </center>
@@ -115,17 +123,17 @@ The following table shows the zero-shot results on 30K samples from the MS-COCO
 | Model | FID↓ | IS↑ | CLIP Score↑<br>(ViT-g/14) | # Params,<br>U-Net | # Params,<br>Whole SDM |
 |:---:|:---:|:---:|:---:|:---:|:---:|
-| Stable Diffusion v1.4 | 13.05 | 36.76 | 0.2958 | 0.86B | 1.04B |
-| BK-SDM-Base (Ours) | 15.76 | 33.79 | 0.2878 | 0.58B | 0.76B |
-| BK-SDM-Small (Ours) | 16.98 | 31.68 | 0.2677 | 0.49B | 0.66B |
-| BK-SDM-Tiny (Ours) | 17.12 | 30.09 | 0.2653 | 0.33B | 0.50B |
 <br/>
 The following figure depicts synthesized images with some MS-COCO captions.
 <center>
-    <img alt="Visual results" img src="https://huggingface.co/spaces/nota-ai/compressed-stable-diffusion/resolve/e6fb31631f0b2948cf6ec54006ea050d6c83e940/docs/fig_results.png" width="100%">
 </center>

 ```
 ## Compression Method
 ### U-Net Architecture
+Certain residual and attention blocks were eliminated from the U-Net of SDM-v1.4:
+- 1.04B-param [SDM-v1.4](https://huggingface.co/CompVis/stable-diffusion-v1-4) (0.86B-param U-Net): the original source model.
+- 0.76B-param [**BK-SDM-Base**](https://huggingface.co/nota-ai/bk-sdm-base) (0.58B-param U-Net): obtained with ① fewer blocks in outer stages.
+- 0.66B-param [**BK-SDM-Small**](https://huggingface.co/nota-ai/bk-sdm-small) (0.49B-param U-Net): obtained with ① and ② mid-stage removal.
+- 0.50B-param [**BK-SDM-Tiny**](https://huggingface.co/nota-ai/bk-sdm-tiny) (0.33B-param U-Net): obtained with ①, ②, and ③ further inner-stage removal.
+<center>
+    <img alt="U-Net architectures" img src="https://netspresso-research-code-release.s3.us-east-2.amazonaws.com/assets-bk-sdm/fig_arch.png" width="100%">
+</center>
 ### Distillation Pretraining
 <center>
+    <img alt="KD-based pretraining" img src="https://netspresso-research-code-release.s3.us-east-2.amazonaws.com/assets-bk-sdm/fig_kd.png" width="100%">
 </center>
 | Model | FID↓ | IS↑ | CLIP Score↑<br>(ViT-g/14) | # Params,<br>U-Net | # Params,<br>Whole SDM |
 |:---:|:---:|:---:|:---:|:---:|:---:|
+| [Stable Diffusion v1.4](https://huggingface.co/CompVis/stable-diffusion-v1-4) | 13.05 | 36.76 | 0.2958 | 0.86B | 1.04B |
+| [BK-SDM-Base](https://huggingface.co/nota-ai/bk-sdm-base) (Ours) | 15.76 | 33.79 | 0.2878 | 0.58B | 0.76B |
+| [BK-SDM-Small](https://huggingface.co/nota-ai/bk-sdm-small) (Ours) | 16.98 | 31.68 | 0.2677 | 0.49B | 0.66B |
+| [BK-SDM-Tiny](https://huggingface.co/nota-ai/bk-sdm-tiny) (Ours) | 17.12 | 30.09 | 0.2653 | 0.33B | 0.50B |
 <br/>
 The following figure depicts synthesized images with some MS-COCO captions.
 <center>
+    <img alt="Visual results" img src="https://netspresso-research-code-release.s3.us-east-2.amazonaws.com/assets-bk-sdm/fig_results.png" width="100%">
 </center>