Text-to-Image
Diffusers
Safetensors
English
edwixx commited on
Commit
63590e6
·
verified ·
1 Parent(s): 73f8e43

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ examples/0.png filter=lfs diff=lfs merge=lfs -text
37
+ examples/1.png filter=lfs diff=lfs merge=lfs -text
38
+ examples/applications.png filter=lfs diff=lfs merge=lfs -text
ControlNetModel/config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "ControlNetModel",
3
+ "_diffusers_version": "0.21.2",
4
+ "_name_or_path": "/mnt/nj-aigc/usr/guiwan/workspace/diffusion_output/face_xl_ipc_v4_2_XiezhenAnimeForeigner/checkpoint-150000/ControlNetModel",
5
+ "act_fn": "silu",
6
+ "addition_embed_type": "text_time",
7
+ "addition_embed_type_num_heads": 64,
8
+ "addition_time_embed_dim": 256,
9
+ "attention_head_dim": [
10
+ 5,
11
+ 10,
12
+ 20
13
+ ],
14
+ "block_out_channels": [
15
+ 320,
16
+ 640,
17
+ 1280
18
+ ],
19
+ "class_embed_type": null,
20
+ "conditioning_channels": 3,
21
+ "conditioning_embedding_out_channels": [
22
+ 16,
23
+ 32,
24
+ 96,
25
+ 256
26
+ ],
27
+ "controlnet_conditioning_channel_order": "rgb",
28
+ "cross_attention_dim": 2048,
29
+ "down_block_types": [
30
+ "DownBlock2D",
31
+ "CrossAttnDownBlock2D",
32
+ "CrossAttnDownBlock2D"
33
+ ],
34
+ "downsample_padding": 1,
35
+ "encoder_hid_dim": null,
36
+ "encoder_hid_dim_type": null,
37
+ "flip_sin_to_cos": true,
38
+ "freq_shift": 0,
39
+ "global_pool_conditions": false,
40
+ "in_channels": 4,
41
+ "layers_per_block": 2,
42
+ "mid_block_scale_factor": 1,
43
+ "norm_eps": 1e-05,
44
+ "norm_num_groups": 32,
45
+ "num_attention_heads": null,
46
+ "num_class_embeds": null,
47
+ "only_cross_attention": false,
48
+ "projection_class_embeddings_input_dim": 2816,
49
+ "resnet_time_scale_shift": "default",
50
+ "transformer_layers_per_block": [
51
+ 1,
52
+ 2,
53
+ 10
54
+ ],
55
+ "upcast_attention": null,
56
+ "use_linear_projection": true
57
+ }
ControlNetModel/diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c8127be9f174101ebdafee9964d856b49b634435cf6daa396d3f593cf0bbbb05
3
+ size 2502139136
README.md ADDED
@@ -0,0 +1,128 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ library_name: diffusers
6
+ pipeline_tag: text-to-image
7
+ ---
8
+
9
+ # InstantID Model Card
10
+
11
+ <div align="center">
12
+
13
+ [**Project Page**](https://instantid.github.io/) **|** [**Paper**](https://arxiv.org/abs/2401.07519) **|** [**Code**](https://github.com/InstantID/InstantID) **|** [🤗 **Gradio demo**](https://huggingface.co/spaces/InstantX/InstantID)
14
+
15
+
16
+ </div>
17
+
18
+ ## Introduction
19
+
20
+ InstantID is a new state-of-the-art tuning-free method to achieve ID-Preserving generation with only single image, supporting various downstream tasks.
21
+
22
+ <div align="center">
23
+ <img src='examples/applications.png'>
24
+ </div>
25
+
26
+
27
+ ## Usage
28
+
29
+ You can directly download the model in this repository.
30
+ You also can download the model in python script:
31
+
32
+ ```python
33
+ from huggingface_hub import hf_hub_download
34
+ hf_hub_download(repo_id="InstantX/InstantID", filename="ControlNetModel/config.json", local_dir="./checkpoints")
35
+ hf_hub_download(repo_id="InstantX/InstantID", filename="ControlNetModel/diffusion_pytorch_model.safetensors", local_dir="./checkpoints")
36
+ hf_hub_download(repo_id="InstantX/InstantID", filename="ip-adapter.bin", local_dir="./checkpoints")
37
+ ```
38
+
39
+ For face encoder, you need to manutally download via this [URL](https://github.com/deepinsight/insightface/issues/1896#issuecomment-1023867304) to `models/antelopev2`.
40
+
41
+ ```python
42
+ # !pip install opencv-python transformers accelerate insightface
43
+ import diffusers
44
+ from diffusers.utils import load_image
45
+ from diffusers.models import ControlNetModel
46
+
47
+ import cv2
48
+ import torch
49
+ import numpy as np
50
+ from PIL import Image
51
+
52
+ from insightface.app import FaceAnalysis
53
+ from pipeline_stable_diffusion_xl_instantid import StableDiffusionXLInstantIDPipeline, draw_kps
54
+
55
+ # prepare 'antelopev2' under ./models
56
+ app = FaceAnalysis(name='antelopev2', root='./', providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
57
+ app.prepare(ctx_id=0, det_size=(640, 640))
58
+
59
+ # prepare models under ./checkpoints
60
+ face_adapter = f'./checkpoints/ip-adapter.bin'
61
+ controlnet_path = f'./checkpoints/ControlNetModel'
62
+
63
+ # load IdentityNet
64
+ controlnet = ControlNetModel.from_pretrained(controlnet_path, torch_dtype=torch.float16)
65
+
66
+ pipe = StableDiffusionXLInstantIDPipeline.from_pretrained(
67
+ ... "stabilityai/stable-diffusion-xl-base-1.0", controlnet=controlnet, torch_dtype=torch.float16
68
+ ... )
69
+ pipe.cuda()
70
+
71
+ # load adapter
72
+ pipe.load_ip_adapter_instantid(face_adapter)
73
+ ```
74
+
75
+ Then, you can customized your own face images
76
+
77
+ ```python
78
+ # load an image
79
+ image = load_image("your-example.jpg")
80
+
81
+ # prepare face emb
82
+ face_info = app.get(cv2.cvtColor(np.array(face_image), cv2.COLOR_RGB2BGR))
83
+ face_info = sorted(face_info, key=lambda x:(x['bbox'][2]-x['bbox'][0])*x['bbox'][3]-x['bbox'][1])[-1] # only use the maximum face
84
+ face_emb = face_info['embedding']
85
+ face_kps = draw_kps(face_image, face_info['kps'])
86
+
87
+ pipe.set_ip_adapter_scale(0.8)
88
+
89
+ prompt = "analog film photo of a man. faded film, desaturated, 35mm photo, grainy, vignette, vintage, Kodachrome, Lomography, stained, highly detailed, found footage, masterpiece, best quality"
90
+ negative_prompt = "(lowres, low quality, worst quality:1.2), (text:1.2), watermark, painting, drawing, illustration, glitch, deformed, mutated, cross-eyed, ugly, disfigured (lowres, low quality, worst quality:1.2), (text:1.2), watermark, painting, drawing, illustration, glitch,deformed, mutated, cross-eyed, ugly, disfigured"
91
+
92
+ # generate image
93
+ image = pipe(
94
+ ... prompt, image_embeds=face_emb, image=face_kps, controlnet_conditioning_scale=0.8
95
+ ... ).images[0]
96
+ ```
97
+
98
+ For more details, please follow the instructions in our [GitHub repository](https://github.com/InstantID/InstantID).
99
+
100
+ ## Usage Tips
101
+ 1. If you're not satisfied with the similarity, try to increase the weight of "IdentityNet Strength" and "Adapter Strength".
102
+ 2. If you feel that the saturation is too high, first decrease the Adapter strength. If it is still too high, then decrease the IdentityNet strength.
103
+ 3. If you find that text control is not as expected, decrease Adapter strength.
104
+ 4. If you find that realistic style is not good enough, go for our Github repo and use a more realistic base model.
105
+
106
+ ## Demos
107
+
108
+ <div align="center">
109
+ <img src='examples/0.png'>
110
+ </div>
111
+
112
+ <div align="center">
113
+ <img src='examples/1.png'>
114
+ </div>
115
+
116
+ ## Disclaimer
117
+
118
+ This project is released under Apache License and aims to positively impact the field of AI-driven image generation. Users are granted the freedom to create images using this tool, but they are obligated to comply with local laws and utilize it responsibly. The developers will not assume any responsibility for potential misuse by users.
119
+
120
+ ## Citation
121
+ ```bibtex
122
+ @article{wang2024instantid,
123
+ title={InstantID: Zero-shot Identity-Preserving Generation in Seconds},
124
+ author={Wang, Qixun and Bai, Xu and Wang, Haofan and Qin, Zekui and Chen, Anthony},
125
+ journal={arXiv preprint arXiv:2401.07519},
126
+ year={2024}
127
+ }
128
+ ```
examples/0.png ADDED

Git LFS Details

  • SHA256: b02e16d938c007409c19783d494737230fe7eb890ac60b08267f0f46b9f17f6e
  • Pointer size: 132 Bytes
  • Size of remote file: 8.71 MB
examples/1.png ADDED

Git LFS Details

  • SHA256: f20e80f08c8efd2ac74ed93070851bea677489643dee8a28912fcd06b56348d2
  • Pointer size: 132 Bytes
  • Size of remote file: 8.47 MB
examples/applications.png ADDED

Git LFS Details

  • SHA256: 59fd297f4b20fcbc51fb100e40bed33a5d27e2b7351576d12f03d34c7608eb88
  • Pointer size: 133 Bytes
  • Size of remote file: 10.7 MB
ip-adapter.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:02b3618e36d803784166660520098089a81388e61a93ef8002aa79a5b1c546e1
3
+ size 1691134141