Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,67 @@
|
|
1 |
---
|
2 |
license: cc-by-nc-4.0
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: cc-by-nc-4.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
tags:
|
6 |
+
- stable cascade
|
7 |
---
|
8 |
+
|
9 |
+
# Stable-Cascade FP16 fix
|
10 |
+
|
11 |
+
**A modified version of [Stable-Cascade](https://huggingface.co/stabilityai/stable-cascade) which is compatibile with fp16 inference**
|
12 |
+
|
13 |
+
## Demo
|
14 |
+
| FP16| BF16|
|
15 |
+
| - | - |
|
16 |
+
|![image/png](https://cdn-uploads.huggingface.co/production/uploads/630593e2fca1d8d92b81d2a1/fkWNY15JQbfh5pe1SY7wS.png)|![image/png](https://cdn-uploads.huggingface.co/production/uploads/630593e2fca1d8d92b81d2a1/XpfqkimqJTeDjggTaV4Mt.png)|
|
17 |
+
|
18 |
+
LPIPS difference: 0.088
|
19 |
+
|
20 |
+
|
21 |
+
| FP16 | BF16|
|
22 |
+
| - | - |
|
23 |
+
|![image/png](https://cdn-uploads.huggingface.co/production/uploads/630593e2fca1d8d92b81d2a1/muOkoNjVK6CFv2rs6QyBr.png)|![image/png](https://cdn-uploads.huggingface.co/production/uploads/630593e2fca1d8d92b81d2a1/rrgb8yMuJDyjJu6wd366j.png)|
|
24 |
+
|
25 |
+
LPIPS difference: 0.012
|
26 |
+
|
27 |
+
## How
|
28 |
+
After doing some check to the L1 norm of each hidden state. I found the last block group(8, 24, 24, 8 <- this one) make the hiddens states become bigger and bigger.
|
29 |
+
|
30 |
+
So I just apply some transformation on the TimestepBlock to directly modify the scale of hidden state. (Since it is not a residual block, so this is possible)
|
31 |
+
|
32 |
+
How the transformation be done is written in the modified "stable_cascade.py", you can put the file into kohya-ss/sd-scripts' stable-cascade branch and uncomment things to check weights or doing the conversion by yourselve.
|
33 |
+
|
34 |
+
|
35 |
+
### FP8
|
36 |
+
Some people may know the FP8 quant for inference SDXL with lowvram cards. The technique can be applied to this model too.<br>
|
37 |
+
But since the last block group is basically ruined, so it is recommend to ignore the last block group:<br>
|
38 |
+
```python
|
39 |
+
for name, module in generator_c.named_modules():
|
40 |
+
if "up_blocks.1" in name: continue
|
41 |
+
if isinstance(module, torch.nn.Linear):
|
42 |
+
module.to(torch.float8_e5m2)
|
43 |
+
elif isinstance(module, torch.nn.Conv2d):
|
44 |
+
module.to(torch.float8_e5m2)
|
45 |
+
elif isinstance(module, torch.nn.MultiheadAttention):
|
46 |
+
module.to(torch.float8_e5m2)
|
47 |
+
```
|
48 |
+
|
49 |
+
This sample code should transform 70% of weight into fp8. (Use FP8 weight with scale is better solution, it is recommended to implement that)
|
50 |
+
|
51 |
+
I have tried different transform settings which is more friendly for FP8 but the differences between original model is more significant.
|
52 |
+
|
53 |
+
FP8 Demo (Same Seed):
|
54 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/630593e2fca1d8d92b81d2a1/wPoZeWGGhcPMck45--y_X.png)
|
55 |
+
|
56 |
+
|
57 |
+
## Notice
|
58 |
+
The modified version of model will not be compatibile with the lora/lycoris trained on original weight. <br>
|
59 |
+
(actually it can, just do the same transformation, I'm considering to rewrite a version to use key name to determine what to do.)
|
60 |
+
|
61 |
+
Also the ControlNets will not be compatible too. Unless you also apply the needed transformation to them.
|
62 |
+
|
63 |
+
I don't want to do all of these by myself so hope some others will do that.
|
64 |
+
|
65 |
+
## License
|
66 |
+
Stable-Cascade is published with a non-commercial lisence so I use CC-BY-NC 4.0 to publish this model.
|
67 |
+
**The source code to make this model is published with apache-2.0 license**
|