metafresh89 DionTimmer commited on
Commit
272f751
·
0 Parent(s):

Duplicate from DionTimmer/controlnet_qrcode-control_v1p_sd15

Browse files

Co-authored-by: Dion Timmer <[email protected]>

.gitattributes ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tflite filter=lfs diff=lfs merge=lfs -text
29
+ *.tgz filter=lfs diff=lfs merge=lfs -text
30
+ *.wasm filter=lfs diff=lfs merge=lfs -text
31
+ *.xz filter=lfs diff=lfs merge=lfs -text
32
+ *.zip filter=lfs diff=lfs merge=lfs -text
33
+ *.zst filter=lfs diff=lfs merge=lfs -text
34
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,97 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - stable-diffusion
4
+ - controlnet
5
+ - image-to-image
6
+ license: openrail++
7
+ language:
8
+ - en
9
+ library_name: diffusers
10
+ pipeline_tag: image-to-image
11
+ duplicated_from: DionTimmer/controlnet_qrcode-control_v1p_sd15
12
+ ---
13
+ # QR Code Conditioned ControlNet Models for Stable Diffusion 1.5
14
+
15
+ ![1](https://www.dropbox.com/s/fxyuqpot2z2ftty/5.png?raw=1)
16
+
17
+ ## Model Description
18
+
19
+ This repo holds the safetensors & diffusers versions of the QR code conditioned ControlNet for Stable Diffusion v1.5.
20
+ The Stable Diffusion 2.1 version is marginally more effective, as it was developed to address my specific needs. However, this 1.5 version model was also trained on the same dataset for those who are using the older version.
21
+
22
+ ## How to use with Diffusers
23
+
24
+
25
+ ```bash
26
+ pip -q install diffusers transformers accelerate torch xformers
27
+ ```
28
+
29
+ ```python
30
+ import torch
31
+ from PIL import Image
32
+ from diffusers import StableDiffusionControlNetImg2ImgPipeline, ControlNetModel, DDIMScheduler
33
+ from diffusers.utils import load_image
34
+
35
+ controlnet = ControlNetModel.from_pretrained("DionTimmer/controlnet_qrcode-control_v1p_sd15",
36
+ torch_dtype=torch.float16)
37
+
38
+ pipe = StableDiffusionControlNetImg2ImgPipeline.from_pretrained(
39
+ "runwayml/stable-diffusion-v1-5",
40
+ controlnet=controlnet,
41
+ safety_checker=None,
42
+ torch_dtype=torch.float16
43
+ )
44
+
45
+ pipe.enable_xformers_memory_efficient_attention()
46
+ pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
47
+ pipe.enable_model_cpu_offload()
48
+
49
+ def resize_for_condition_image(input_image: Image, resolution: int):
50
+ input_image = input_image.convert("RGB")
51
+ W, H = input_image.size
52
+ k = float(resolution) / min(H, W)
53
+ H *= k
54
+ W *= k
55
+ H = int(round(H / 64.0)) * 64
56
+ W = int(round(W / 64.0)) * 64
57
+ img = input_image.resize((W, H), resample=Image.LANCZOS)
58
+ return img
59
+
60
+
61
+ # play with guidance_scale, controlnet_conditioning_scale and strength to make a valid QR Code Image
62
+
63
+ # qr code image
64
+ source_image = load_image("https://s3.amazonaws.com/moonup/production/uploads/6064e095abd8d3692e3e2ed6/A_RqHaAM6YHBodPLwqtjn.png")
65
+ # initial image, anything
66
+ init_image = load_image("https://s3.amazonaws.com/moonup/production/uploads/noauth/KfMBABpOwIuNolv1pe3qX.jpeg")
67
+ condition_image = resize_for_condition_image(source_image, 768)
68
+ init_image = resize_for_condition_image(init_image, 768)
69
+ generator = torch.manual_seed(123121231)
70
+ image = pipe(prompt="a bilboard in NYC with a qrcode",
71
+ negative_prompt="ugly, disfigured, low quality, blurry, nsfw",
72
+ image=init_image,
73
+ control_image=condition_image,
74
+ width=768,
75
+ height=768,
76
+ guidance_scale=20,
77
+ controlnet_conditioning_scale=1.5,
78
+ generator=generator,
79
+ strength=0.9,
80
+ num_inference_steps=150,
81
+ )
82
+
83
+ image.images[0]
84
+
85
+ ```
86
+
87
+ ## Performance and Limitations
88
+
89
+ These models perform quite well in most cases, but please note that they are not 100% accurate. In some instances, the QR code shape might not come through as expected. You can increase the ControlNet weight to emphasize the QR code shape. However, be cautious as this might negatively impact the style of your output.**To optimize for scanning, please generate your QR codes with correction mode 'H' (30%).**
90
+
91
+ To balance between style and shape, a gentle fine-tuning of the control weight might be required based on the individual input and the desired output, aswell as the correct prompt. Some prompts do not work until you increase the weight by a lot. The process of finding the right balance between these factors is part art and part science. For the best results, it is recommended to generate your artwork at a resolution of 768. This allows for a higher level of detail in the final product, enhancing the quality and effectiveness of the QR code-based artwork.
92
+
93
+ ## Installation
94
+
95
+ The simplest way to use this is to place the .safetensors model and its .yaml config file in the folder where your other controlnet models are installed, which varies per application.
96
+ For usage in auto1111 they can be placed in the webui/models/ControlNet folder. They can be loaded using the controlnet webui extension which you can install through the extensions tab in the webui (https://github.com/Mikubill/sd-webui-controlnet). Make sure to enable your controlnet unit and set your input image as the QR code. Set the model to either the SD2.1 or 1.5 version depending on your base stable diffusion model, or it will error. No pre-processor is needed, though you can use the invert pre-processor for a different variation of results. 768 is the preferred resolution for generation since it allows for more detail.
97
+ Make sure to look up additional info on how to use controlnet if you get stuck, once you have the webui up and running its really easy to install the controlnet extension aswell.
config.json ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "ControlNetModel",
3
+ "_diffusers_version": "0.18.0.dev0",
4
+ "act_fn": "silu",
5
+ "attention_head_dim": 8,
6
+ "block_out_channels": [
7
+ 320,
8
+ 640,
9
+ 1280,
10
+ 1280
11
+ ],
12
+ "class_embed_type": null,
13
+ "conditioning_embedding_out_channels": [
14
+ 16,
15
+ 32,
16
+ 96,
17
+ 256
18
+ ],
19
+ "controlnet_conditioning_channel_order": "rgb",
20
+ "cross_attention_dim": 768,
21
+ "down_block_types": [
22
+ "CrossAttnDownBlock2D",
23
+ "CrossAttnDownBlock2D",
24
+ "CrossAttnDownBlock2D",
25
+ "DownBlock2D"
26
+ ],
27
+ "downsample_padding": 1,
28
+ "flip_sin_to_cos": true,
29
+ "freq_shift": 0,
30
+ "global_pool_conditions": false,
31
+ "in_channels": 4,
32
+ "layers_per_block": 2,
33
+ "mid_block_scale_factor": 1,
34
+ "norm_eps": 1e-05,
35
+ "norm_num_groups": 32,
36
+ "num_class_embeds": null,
37
+ "only_cross_attention": false,
38
+ "projection_class_embeddings_input_dim": null,
39
+ "resnet_time_scale_shift": "default",
40
+ "upcast_attention": null,
41
+ "use_linear_projection": false
42
+ }
control_v1p_sd15_qrcode.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5d8059dfffccfcaee5b827c4f0f126baf4d6ab66f9a80b2ec4ef0ed451a62ead
3
+ size 1445154814
control_v1p_sd15_qrcode.yaml ADDED
@@ -0,0 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model:
2
+ target: cldm.cldm.ControlLDM
3
+ params:
4
+ linear_start: 0.00085
5
+ linear_end: 0.0120
6
+ num_timesteps_cond: 1
7
+ log_every_t: 200
8
+ timesteps: 1000
9
+ first_stage_key: "jpg"
10
+ cond_stage_key: "txt"
11
+ control_key: "hint"
12
+ image_size: 64
13
+ channels: 4
14
+ cond_stage_trainable: false
15
+ conditioning_key: crossattn
16
+ monitor: val/loss_simple_ema
17
+ scale_factor: 0.18215
18
+ use_ema: False
19
+ only_mid_control: False
20
+
21
+ control_stage_config:
22
+ target: cldm.cldm.ControlNet
23
+ params:
24
+ image_size: 32 # unused
25
+ in_channels: 4
26
+ hint_channels: 3
27
+ model_channels: 320
28
+ attention_resolutions: [ 4, 2, 1 ]
29
+ num_res_blocks: 2
30
+ channel_mult: [ 1, 2, 4, 4 ]
31
+ num_heads: 8
32
+ use_spatial_transformer: True
33
+ transformer_depth: 1
34
+ context_dim: 768
35
+ use_checkpoint: True
36
+ legacy: False
37
+
38
+ unet_config:
39
+ target: cldm.cldm.ControlledUnetModel
40
+ params:
41
+ image_size: 32 # unused
42
+ in_channels: 4
43
+ out_channels: 4
44
+ model_channels: 320
45
+ attention_resolutions: [ 4, 2, 1 ]
46
+ num_res_blocks: 2
47
+ channel_mult: [ 1, 2, 4, 4 ]
48
+ num_heads: 8
49
+ use_spatial_transformer: True
50
+ transformer_depth: 1
51
+ context_dim: 768
52
+ use_checkpoint: True
53
+ legacy: False
54
+
55
+ first_stage_config:
56
+ target: ldm.models.autoencoder.AutoencoderKL
57
+ params:
58
+ embed_dim: 4
59
+ monitor: val/rec_loss
60
+ ddconfig:
61
+ double_z: true
62
+ z_channels: 4
63
+ resolution: 256
64
+ in_channels: 3
65
+ out_ch: 3
66
+ ch: 128
67
+ ch_mult:
68
+ - 1
69
+ - 2
70
+ - 4
71
+ - 4
72
+ num_res_blocks: 2
73
+ attn_resolutions: []
74
+ dropout: 0.0
75
+ lossconfig:
76
+ target: torch.nn.Identity
77
+
78
+ cond_stage_config:
79
+ target: ldm.modules.encoders.modules.FrozenCLIPEmbedder
diffusion_pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a712800aa51ed13ab747b7d6523849ce736b3e760ca378eccbbd84356194f3cc
3
+ size 1445254969
diffusion_pytorch_model.fp16.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:afe344f75b44d61149255059e6db40a57c36f81c3a42f3e67d9f92d72b6f71df
3
+ size 722698343
diffusion_pytorch_model.fp16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ca419d6e83c160a34d0af035a868ec39dc5c371142a41cd735ca15ada4145fd9
3
+ size 722598648
diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5b5a1379389660462b7b2c093d3a1b934ea410dc0275f68db1bd4cdf0f031e53
3
+ size 1445157120