Text-to-Image
Diffusers
English
SVDQuant
FLUX.1-dev
INT4
FLUX.1
Diffusion
Quantization
Lmxyy commited on
Commit
ebd7472
·
verified ·
1 Parent(s): daedec7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -0
README.md CHANGED
@@ -31,6 +31,7 @@ library_name: diffusers
31
  <a href='https://hanlab.mit.edu/projects/svdquant'>[Website]</a>&ensp;
32
  <a href='https://hanlab.mit.edu/blog/svdquant'>[Blog]</a>
33
  </div>
 
34
  ![teaser](https://github.com/mit-han-lab/nunchaku/raw/refs/heads/main/assets/teaser.jpg)
35
  SVDQuant is a post-training quantization technique for 4-bit weights and activations that well maintains visual fidelity. On 12B FLUX.1-dev, it achieves 3.6× memory reduction compared to the BF16 model. By eliminating CPU offloading, it offers 8.7× speedup over the 16-bit model when on a 16GB laptop 4090 GPU, 3× faster than the NF4 W4A16 baseline. On PixArt-∑, it demonstrates significantly superior visual quality over other W4A4 or even W4A8 baselines. "E2E" means the end-to-end latency including the text encoder and VAE decoder.
36
 
 
31
  <a href='https://hanlab.mit.edu/projects/svdquant'>[Website]</a>&ensp;
32
  <a href='https://hanlab.mit.edu/blog/svdquant'>[Blog]</a>
33
  </div>
34
+
35
  ![teaser](https://github.com/mit-han-lab/nunchaku/raw/refs/heads/main/assets/teaser.jpg)
36
  SVDQuant is a post-training quantization technique for 4-bit weights and activations that well maintains visual fidelity. On 12B FLUX.1-dev, it achieves 3.6× memory reduction compared to the BF16 model. By eliminating CPU offloading, it offers 8.7× speedup over the 16-bit model when on a 16GB laptop 4090 GPU, 3× faster than the NF4 W4A16 baseline. On PixArt-∑, it demonstrates significantly superior visual quality over other W4A4 or even W4A8 baselines. "E2E" means the end-to-end latency including the text encoder and VAE decoder.
37