Spaces:

feishen29
/

IMAGDressing-v1

Running on Zero

App Files Files Community

feishen29 commited on Jul 26, 2024

Commit

4a4c777

verified ·

1 Parent(s): b18e2e0

Delete ckpt

Browse files

Files changed (37) hide show

ckpt/.DS_Store +0 -0
ckpt/ControlNet/body_pose_model.pth +0 -3
ckpt/ControlNet/facenet.pth +0 -3
ckpt/ControlNet/hand_pose_model.pth +0 -3
ckpt/IMAGDressing-v1_512.pt +0 -3
ckpt/buffalo_l.zip +0 -3
ckpt/control_v11p_sd15_openpose/.gitattributes +0 -34
ckpt/control_v11p_sd15_openpose/README.md +0 -163
ckpt/control_v11p_sd15_openpose/config.json +0 -42
ckpt/control_v11p_sd15_openpose/control_net_open_pose.py +0 -60
ckpt/control_v11p_sd15_openpose/diffusion_pytorch_model.bin +0 -3
ckpt/control_v11p_sd15_openpose/diffusion_pytorch_model.fp16.bin +0 -3
ckpt/control_v11p_sd15_openpose/diffusion_pytorch_model.fp16.safetensors +0 -3
ckpt/control_v11p_sd15_openpose/diffusion_pytorch_model.safetensors +0 -3
ckpt/control_v11p_sd15_openpose/images/control.png +0 -0
ckpt/control_v11p_sd15_openpose/images/image_out.png +0 -0
ckpt/control_v11p_sd15_openpose/images/input.png +0 -0
ckpt/control_v11p_sd15_openpose/sd.png +0 -0
ckpt/image_encoder/.DS_Store +0 -0
ckpt/image_encoder/config.json +0 -23
ckpt/image_encoder/model.safetensors +0 -3
ckpt/image_encoder/pytorch_model.bin +0 -3
ckpt/ip-adapter-faceid-plus_sd15.bin +0 -3
ckpt/scheduler/scheduler_config.json +0 -21
ckpt/sd-vae-ft-mse/.gitattributes +0 -33
ckpt/sd-vae-ft-mse/README.md +0 -83
ckpt/sd-vae-ft-mse/config.json +0 -29
ckpt/sd-vae-ft-mse/diffusion_pytorch_model.bin +0 -3
ckpt/sd-vae-ft-mse/diffusion_pytorch_model.safetensors +0 -3
ckpt/text_encoder/config.json +0 -25
ckpt/text_encoder/model.safetensors +0 -3
ckpt/tokenizer/merges.txt +0 -0
ckpt/tokenizer/special_tokens_map.json +0 -24
ckpt/tokenizer/tokenizer_config.json +0 -33
ckpt/tokenizer/vocab.json +0 -0
ckpt/unet/config.json +0 -60
ckpt/unet/diffusion_pytorch_model.safetensors +0 -3

ckpt/.DS_Store DELETED Viewed

Binary file (6.15 kB)

ckpt/ControlNet/body_pose_model.pth DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:25a948c16078b0f08e236bda51a385d855ef4c153598947c28c0d47ed94bb746
-size 209267595

ckpt/ControlNet/facenet.pth DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:8beb52e548624ffcc4aed12af7aee7dcbfaeea420c75609fee999fe7add79d43
-size 153718792

ckpt/ControlNet/hand_pose_model.pth DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:b76b00d1750901abd07b9f9d8c98cc3385b8fe834a26d4b4f0aad439e75fc600
-size 147341049

ckpt/IMAGDressing-v1_512.pt DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:c37a38119c1420735345013ce79b28f927f877267297780b6669f0faa9701ce6
-size 3547959907

ckpt/buffalo_l.zip DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:80ffe37d8a5940d59a7384c201a2a38d4741f2f3c51eef46ebb28218a7b0ca2f
-size 288621354

ckpt/control_v11p_sd15_openpose/.gitattributes DELETED Viewed

@@ -1,34 +0,0 @@
-*.7z filter=lfs diff=lfs merge=lfs -text
-*.arrow filter=lfs diff=lfs merge=lfs -text
-*.bin filter=lfs diff=lfs merge=lfs -text
-*.bz2 filter=lfs diff=lfs merge=lfs -text
-*.ckpt filter=lfs diff=lfs merge=lfs -text
-*.ftz filter=lfs diff=lfs merge=lfs -text
-*.gz filter=lfs diff=lfs merge=lfs -text
-*.h5 filter=lfs diff=lfs merge=lfs -text
-*.joblib filter=lfs diff=lfs merge=lfs -text
-*.lfs.* filter=lfs diff=lfs merge=lfs -text
-*.mlmodel filter=lfs diff=lfs merge=lfs -text
-*.model filter=lfs diff=lfs merge=lfs -text
-*.msgpack filter=lfs diff=lfs merge=lfs -text
-*.npy filter=lfs diff=lfs merge=lfs -text
-*.npz filter=lfs diff=lfs merge=lfs -text
-*.onnx filter=lfs diff=lfs merge=lfs -text
-*.ot filter=lfs diff=lfs merge=lfs -text
-*.parquet filter=lfs diff=lfs merge=lfs -text
-*.pb filter=lfs diff=lfs merge=lfs -text
-*.pickle filter=lfs diff=lfs merge=lfs -text
-*.pkl filter=lfs diff=lfs merge=lfs -text
-*.pt filter=lfs diff=lfs merge=lfs -text
-*.pth filter=lfs diff=lfs merge=lfs -text
-*.rar filter=lfs diff=lfs merge=lfs -text
-*.safetensors filter=lfs diff=lfs merge=lfs -text
-saved_model/**/* filter=lfs diff=lfs merge=lfs -text
-*.tar.* filter=lfs diff=lfs merge=lfs -text
-*.tflite filter=lfs diff=lfs merge=lfs -text
-*.tgz filter=lfs diff=lfs merge=lfs -text
-*.wasm filter=lfs diff=lfs merge=lfs -text
-*.xz filter=lfs diff=lfs merge=lfs -text
-*.zip filter=lfs diff=lfs merge=lfs -text
-*.zst filter=lfs diff=lfs merge=lfs -text
-*tfevents* filter=lfs diff=lfs merge=lfs -text

ckpt/control_v11p_sd15_openpose/README.md DELETED Viewed

@@ -1,163 +0,0 @@
----
-license: openrail
-base_model: runwayml/stable-diffusion-v1-5
-tags:
-- art
-- controlnet
-- stable-diffusion
-- controlnet-v1-1
-- image-to-image
-duplicated_from: ControlNet-1-1-preview/control_v11p_sd15_openpose
----
-# Controlnet - v1.1 - *openpose Version*
-**Controlnet v1.1** is the successor model of [Controlnet v1.0](https://huggingface.co/lllyasviel/ControlNet)
-and was released in [lllyasviel/ControlNet-v1-1](https://huggingface.co/lllyasviel/ControlNet-v1-1) by [Lvmin Zhang](https://huggingface.co/lllyasviel).
-This checkpoint is a conversion of [the original checkpoint](https://huggingface.co/lllyasviel/ControlNet-v1-1/blob/main/control_v11p_sd15_openpose.pth) into `diffusers` format.
-It can be used in combination with **Stable Diffusion**, such as [runwayml/stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5).
-For more details, please also have a look at the [🧨 Diffusers docs](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/controlnet).
-ControlNet is a neural network structure to control diffusion models by adding extra conditions.
-![img](./sd.png)
-This checkpoint corresponds to the ControlNet conditioned on **openpose images**.
-## Model Details
-- **Developed by:** Lvmin Zhang, Maneesh Agrawala
-- **Model type:** Diffusion-based text-to-image generation model
-- **Language(s):** English
-- **License:** [The CreativeML OpenRAIL M license](https://huggingface.co/spaces/CompVis/stable-diffusion-license) is an [Open RAIL M license](https://www.licenses.ai/blog/2022/8/18/naming-convention-of-responsible-ai-licenses), adapted from the work that [BigScience](https://bigscience.huggingface.co/) and [the RAIL Initiative](https://www.licenses.ai/) are jointly carrying in the area of responsible AI licensing. See also [the article about the BLOOM Open RAIL license](https://bigscience.huggingface.co/blog/the-bigscience-rail-license) on which our license is based.
-- **Resources for more information:** [GitHub Repository](https://github.com/lllyasviel/ControlNet), [Paper](https://arxiv.org/abs/2302.05543).
-- **Cite as:**
-  @misc{zhang2023adding,
-    title={Adding Conditional Control to Text-to-Image Diffusion Models},
-    author={Lvmin Zhang and Maneesh Agrawala},
-    year={2023},
-    eprint={2302.05543},
-    archivePrefix={arXiv},
-    primaryClass={cs.CV}
-  }
-## Introduction
-Controlnet was proposed in [*Adding Conditional Control to Text-to-Image Diffusion Models*](https://arxiv.org/abs/2302.05543) by
-Lvmin Zhang, Maneesh Agrawala.
-The abstract reads as follows:
-*We present a neural network structure, ControlNet, to control pretrained large diffusion models to support additional input conditions.
-The ControlNet learns task-specific conditions in an end-to-end way, and the learning is robust even when the training dataset is small (< 50k).
-Moreover, training a ControlNet is as fast as fine-tuning a diffusion model, and the model can be trained on a personal devices.
-Alternatively, if powerful computation clusters are available, the model can scale to large amounts (millions to billions) of data.
-We report that large diffusion models like Stable Diffusion can be augmented with ControlNets to enable conditional inputs like edge maps, segmentation maps, keypoints, etc.
-This may enrich the methods to control large diffusion models and further facilitate related applications.*
-## Example
-It is recommended to use the checkpoint with [Stable Diffusion v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5) as the checkpoint
-has been trained on it.
-Experimentally, the checkpoint can be used with other diffusion models such as dreamboothed stable diffusion.
-**Note**: If you want to process an image to create the auxiliary conditioning, external dependencies are required as shown below:
-1. Install https://github.com/patrickvonplaten/controlnet_aux
-```sh
-$ pip install controlnet_aux==0.3.0
-```
-2. Let's install `diffusers` and related packages:
-```
-$ pip install diffusers transformers accelerate
-```
-3. Run code:
-```python
-import torch
-import os
-from huggingface_hub import HfApi
-from pathlib import Path
-from diffusers.utils import load_image
-from PIL import Image
-import numpy as np
-from controlnet_aux import OpenposeDetector
-from diffusers import (
-    ControlNetModel,
-    StableDiffusionControlNetPipeline,
-    UniPCMultistepScheduler,
-)
-checkpoint = "lllyasviel/control_v11p_sd15_openpose"
-image = load_image(
-    "https://huggingface.co/lllyasviel/control_v11p_sd15_openpose/resolve/main/images/input.png"
-)
-prompt = "chef in the kitchen"
-processor = OpenposeDetector.from_pretrained('lllyasviel/ControlNet')
-control_image = processor(image, hand_and_face=True)
-control_image.save("./images/control.png")
-controlnet = ControlNetModel.from_pretrained(checkpoint, torch_dtype=torch.float16)
-pipe = StableDiffusionControlNetPipeline.from_pretrained(
-    "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16
-)
-pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
-pipe.enable_model_cpu_offload()
-generator = torch.manual_seed(0)
-image = pipe(prompt, num_inference_steps=30, generator=generator, image=control_image).images[0]
-image.save('images/image_out.png')
-```
-![bird](./images/input.png)
-![bird_canny](./images/control.png)
-![bird_canny_out](./images/image_out.png)
-## Other released checkpoints v1-1
-The authors released 14 different checkpoints, each trained with [Stable Diffusion v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5)
-on a different type of conditioning:
-| Model Name | Control Image Overview| Control Image Example | Generated Image Example |
-|---|---|---|---|
-|[lllyasviel/control_v11p_sd15_canny](https://huggingface.co/lllyasviel/control_v11p_sd15_canny)<br/> *Trained with canny edge detection* | A monochrome image with white edges on a black background.|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_canny/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_canny/resolve/main/images/control.png"/></a>|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_canny/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_canny/resolve/main/images/image_out.png"/></a>|
-|[lllyasviel/control_v11e_sd15_ip2p](https://huggingface.co/lllyasviel/control_v11e_sd15_ip2p)<br/> *Trained with pixel to pixel instruction* | No condition .|<a href="https://huggingface.co/lllyasviel/control_v11e_sd15_ip2p/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11e_sd15_ip2p/resolve/main/images/control.png"/></a>|<a href="https://huggingface.co/lllyasviel/control_v11e_sd15_ip2p/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11e_sd15_ip2p/resolve/main/images/image_out.png"/></a>|
-|[lllyasviel/control_v11p_sd15_inpaint](https://huggingface.co/lllyasviel/control_v11p_sd15_inpaint)<br/> Trained with image inpainting | No condition.|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_inpaint/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_inpaint/resolve/main/images/control.png"/></a>|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_inpaint/resolve/main/images/output.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_inpaint/resolve/main/images/output.png"/></a>|
-|[lllyasviel/control_v11p_sd15_mlsd](https://huggingface.co/lllyasviel/control_v11p_sd15_mlsd)<br/> Trained with multi-level line segment detection | An image with annotated line segments.|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_mlsd/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_mlsd/resolve/main/images/control.png"/></a>|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_mlsd/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_mlsd/resolve/main/images/image_out.png"/></a>|
-|[lllyasviel/control_v11f1p_sd15_depth](https://huggingface.co/lllyasviel/control_v11f1p_sd15_depth)<br/> Trained with depth estimation | An image with depth information, usually represented as a grayscale image.|<a href="https://huggingface.co/lllyasviel/control_v11f1p_sd15_depth/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11f1p_sd15_depth/resolve/main/images/control.png"/></a>|<a href="https://huggingface.co/lllyasviel/control_v11f1p_sd15_depth/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11f1p_sd15_depth/resolve/main/images/image_out.png"/></a>|
-|[lllyasviel/control_v11p_sd15_normalbae](https://huggingface.co/lllyasviel/control_v11p_sd15_normalbae)<br/> Trained with surface normal estimation | An image with surface normal information, usually represented as a color-coded image.|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_normalbae/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_normalbae/resolve/main/images/control.png"/></a>|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_normalbae/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_normalbae/resolve/main/images/image_out.png"/></a>|
-|[lllyasviel/control_v11p_sd15_seg](https://huggingface.co/lllyasviel/control_v11p_sd15_seg)<br/> Trained with image segmentation | An image with segmented regions, usually represented as a color-coded image.|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_seg/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_seg/resolve/main/images/control.png"/></a>|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_seg/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_seg/resolve/main/images/image_out.png"/></a>|
-|[lllyasviel/control_v11p_sd15_lineart](https://huggingface.co/lllyasviel/control_v11p_sd15_lineart)<br/> Trained with line art generation | An image with line art, usually black lines on a white background.|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_lineart/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_lineart/resolve/main/images/control.png"/></a>|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_lineart/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_lineart/resolve/main/images/image_out.png"/></a>|
-|[lllyasviel/control_v11p_sd15s2_lineart_anime](https://huggingface.co/lllyasviel/control_v11p_sd15s2_lineart_anime)<br/> Trained with anime line art generation | An image with anime-style line art.|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15s2_lineart_anime/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15s2_lineart_anime/resolve/main/images/control.png"/></a>|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15s2_lineart_anime/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15s2_lineart_anime/resolve/main/images/image_out.png"/></a>|
-|[lllyasviel/control_v11p_sd15_openpose](https://huggingface.co/lllyasviel/control_v11p_sd15_openpose)<br/> Trained with human pose estimation | An image with human poses, usually represented as a set of keypoints or skeletons.|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_openpose/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_openpose/resolve/main/images/control.png"/></a>|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_openpose/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_openpose/resolve/main/images/image_out.png"/></a>|
-|[lllyasviel/control_v11p_sd15_scribble](https://huggingface.co/lllyasviel/control_v11p_sd15_scribble)<br/> Trained with scribble-based image generation | An image with scribbles, usually random or user-drawn strokes.|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_scribble/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_scribble/resolve/main/images/control.png"/></a>|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_scribble/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_scribble/resolve/main/images/image_out.png"/></a>|
-|[lllyasviel/control_v11p_sd15_softedge](https://huggingface.co/lllyasviel/control_v11p_sd15_softedge)<br/> Trained with soft edge image generation | An image with soft edges, usually to create a more painterly or artistic effect.|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_softedge/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11p_sd15_softedge/resolve/main/images/control.png"/></a>|<a href="https://huggingface.co/lllyasviel/control_v11p_sd15_softedge/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11p_sd15_softedge/resolve/main/images/image_out.png"/></a>|
-|[lllyasviel/control_v11e_sd15_shuffle](https://huggingface.co/lllyasviel/control_v11e_sd15_shuffle)<br/> Trained with image shuffling | An image with shuffled patches or regions.|<a href="https://huggingface.co/lllyasviel/control_v11e_sd15_shuffle/resolve/main/images/control.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/lllyasviel/control_v11e_sd15_shuffle/resolve/main/images/control.png"/></a>|<a href="https://huggingface.co/lllyasviel/control_v11e_sd15_shuffle/resolve/main/images/image_out.png"><img width="64" src="https://huggingface.co/lllyasviel/control_v11e_sd15_shuffle/resolve/main/images/image_out.png"/></a>|
-## Improvements in Openpose 1.1:
-- The improvement of this model is mainly based on our improved implementation of OpenPose. We carefully reviewed the difference between the pytorch OpenPose and CMU's c++ openpose. Now the processor should be more accurate, especially for hands. The improvement of processor leads to the improvement of Openpose 1.1.
-- More inputs are supported (hand and face).
-- The training dataset of previous cnet 1.0 has several problems including (1) a small group of greyscale human images are duplicated thousands of times (!!), causing the previous model somewhat likely to generate grayscale human images; (2) some images has low quality, very blurry, or significant JPEG artifacts; (3) a small group of images has wrong paired prompts caused by a mistake in our data processing scripts. The new model fixed all problems of the training dataset and should be more reasonable in many cases.
-## More information
-For more information, please also have a look at the [Diffusers ControlNet Blog Post](https://huggingface.co/blog/controlnet) and have a look at the [official docs](https://github.com/lllyasviel/ControlNet-v1-1-nightly).

ckpt/control_v11p_sd15_openpose/config.json DELETED Viewed

@@ -1,42 +0,0 @@
-{
-  "_class_name": "ControlNetModel",
-  "_diffusers_version": "0.16.0.dev0",
-  "_name_or_path": "/home/patrick/controlnet_v1_1/control_v11p_sd15_openpose",
-  "act_fn": "silu",
-  "attention_head_dim": 8,
-  "block_out_channels": [
-    320,
-    640,
-    1280,
-    1280
-  ],
-  "class_embed_type": null,
-  "conditioning_embedding_out_channels": [
-    16,
-    32,
-    96,
-    256
-  ],
-  "controlnet_conditioning_channel_order": "rgb",
-  "cross_attention_dim": 768,
-  "down_block_types": [
-    "CrossAttnDownBlock2D",
-    "CrossAttnDownBlock2D",
-    "CrossAttnDownBlock2D",
-    "DownBlock2D"
-  ],
-  "downsample_padding": 1,
-  "flip_sin_to_cos": true,
-  "freq_shift": 0,
-  "in_channels": 4,
-  "layers_per_block": 2,
-  "mid_block_scale_factor": 1,
-  "norm_eps": 1e-05,
-  "norm_num_groups": 32,
-  "num_class_embeds": null,
-  "only_cross_attention": false,
-  "projection_class_embeddings_input_dim": null,
-  "resnet_time_scale_shift": "default",
-  "upcast_attention": false,
-  "use_linear_projection": false
-}

ckpt/control_v11p_sd15_openpose/control_net_open_pose.py DELETED Viewed

@@ -1,60 +0,0 @@
-#!/usr/bin/env python3
-import torch
-import os
-from huggingface_hub import HfApi
-from pathlib import Path
-from diffusers.utils import load_image
-from controlnet_aux import OpenposeDetector
-from diffusers import (
-    ControlNetModel,
-    StableDiffusionControlNetPipeline,
-    UniPCMultistepScheduler,
-)
-import sys
-checkpoint = sys.argv[1]
-<<<<<<< HEAD
-image = load_image("https://github.com/lllyasviel/ControlNet-v1-1-nightly/raw/main/test_imgs/demo.jpg").resize((512, 512))
-prompt = "The pope with sunglasses rapping with a mic"
-openpose = OpenposeDetector.from_pretrained('lllyasviel/ControlNet')
-image = openpose(image, hand_and_face=True)
-=======
-image = load_image("https://huggingface.co/lllyasviel/sd-controlnet-openpose/resolve/main/images/pose.png")
-prompt = "chef in the kitchen"
-openpose = OpenposeDetector.from_pretrained('lllyasviel/ControlNet')
-image = openpose(image)
->>>>>>> 6e2c3bc1a649ac194d79bb2f4ee11900d7f0e8f6
-controlnet = ControlNetModel.from_pretrained(checkpoint, torch_dtype=torch.float16)
-pipe = StableDiffusionControlNetPipeline.from_pretrained(
-    "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16
-)
-pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
-pipe.enable_model_cpu_offload()
-generator = torch.manual_seed(33)
-<<<<<<< HEAD
-out_image = pipe(prompt, num_inference_steps=35, generator=generator, image=image).images[0]
-=======
-out_image = pipe(prompt, num_inference_steps=20, generator=generator, image=image).images[0]
->>>>>>> 6e2c3bc1a649ac194d79bb2f4ee11900d7f0e8f6
-path = os.path.join(Path.home(), "images", "aa.png")
-out_image.save(path)
-api = HfApi()
-api.upload_file(
-    path_or_fileobj=path,
-    path_in_repo=path.split("/")[-1],
-    repo_id="patrickvonplaten/images",
-    repo_type="dataset",
-)
-print("https://huggingface.co/datasets/patrickvonplaten/images/blob/main/aa.png")

ckpt/control_v11p_sd15_openpose/diffusion_pytorch_model.bin DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:40c80b93aea10c31de2d282adbe8bbb945611a037ca36e0cd55d3ee7d59fedce
-size 1445254969

ckpt/control_v11p_sd15_openpose/diffusion_pytorch_model.fp16.bin DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:65c13c04dc49231f7044373e3f0dbd2f44b01a445c8577ea919cd5ff5fac29a6
-size 722698343

ckpt/control_v11p_sd15_openpose/diffusion_pytorch_model.fp16.safetensors DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:b25b1125e870275550b2a7de289056cb3c236c01c293bd5ba883657b1c006e3e
-size 722598642

ckpt/control_v11p_sd15_openpose/diffusion_pytorch_model.safetensors DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:46b10abb28f3750aba7eea208e188539f7945d9256de9a248cbb9902f2276988
-size 1445157124

ckpt/control_v11p_sd15_openpose/images/control.png DELETED Viewed

Binary file (8.41 kB)

ckpt/control_v11p_sd15_openpose/images/image_out.png DELETED Viewed

Binary file (655 kB)

ckpt/control_v11p_sd15_openpose/images/input.png DELETED Viewed

Binary file (16.3 kB)

ckpt/control_v11p_sd15_openpose/sd.png DELETED Viewed

Binary file (59.5 kB)

ckpt/image_encoder/.DS_Store DELETED Viewed

Binary file (6.15 kB)

ckpt/image_encoder/config.json DELETED Viewed

@@ -1,23 +0,0 @@
-{
-  "_name_or_path": "./image_encoder",
-  "architectures": [
-    "CLIPVisionModelWithProjection"
-  ],
-  "attention_dropout": 0.0,
-  "dropout": 0.0,
-  "hidden_act": "gelu",
-  "hidden_size": 1280,
-  "image_size": 224,
-  "initializer_factor": 1.0,
-  "initializer_range": 0.02,
-  "intermediate_size": 5120,
-  "layer_norm_eps": 1e-05,
-  "model_type": "clip_vision_model",
-  "num_attention_heads": 16,
-  "num_channels": 3,
-  "num_hidden_layers": 32,
-  "patch_size": 14,
-  "projection_dim": 1024,
-  "torch_dtype": "float16",
-  "transformers_version": "4.28.0.dev0"
-}

ckpt/image_encoder/model.safetensors DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:6ca9667da1ca9e0b0f75e46bb030f7e011f44f86cbfb8d5a36590fcd7507b030
-size 2528373448

ckpt/image_encoder/pytorch_model.bin DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:3d3ec1e66737f77a4f3bc2df3c52eacefc69ce7825e2784183b1d4e9877d9193
-size 2528481905

ckpt/ip-adapter-faceid-plus_sd15.bin DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:252fb53e0d018489d9e7f9b9e2001a52ff700e491894011ada7cfb471e0fadf2
-size 156558503

ckpt/scheduler/scheduler_config.json DELETED Viewed

@@ -1,21 +0,0 @@
-{
-  "_class_name": "DEISMultistepScheduler",
-  "_diffusers_version": "0.16.1",
-  "algorithm_type": "deis",
-  "beta_end": 0.012,
-  "beta_schedule": "scaled_linear",
-  "beta_start": 0.00085,
-  "clip_sample": false,
-  "clip_sample_range": 1.0,
-  "dynamic_thresholding_ratio": 0.995,
-  "lower_order_final": true,
-  "num_train_timesteps": 1000,
-  "prediction_type": "epsilon",
-  "sample_max_value": 1.0,
-  "set_alpha_to_one": false,
-  "solver_order": 2,
-  "solver_type": "logrho",
-  "steps_offset": 1,
-  "thresholding": false,
-  "trained_betas": null
-}

ckpt/sd-vae-ft-mse/.gitattributes DELETED Viewed

@@ -1,33 +0,0 @@
-*.7z filter=lfs diff=lfs merge=lfs -text
-*.arrow filter=lfs diff=lfs merge=lfs -text
-*.bin filter=lfs diff=lfs merge=lfs -text
-*.bz2 filter=lfs diff=lfs merge=lfs -text
-*.ftz filter=lfs diff=lfs merge=lfs -text
-*.gz filter=lfs diff=lfs merge=lfs -text
-*.h5 filter=lfs diff=lfs merge=lfs -text
-*.joblib filter=lfs diff=lfs merge=lfs -text
-*.lfs.* filter=lfs diff=lfs merge=lfs -text
-*.mlmodel filter=lfs diff=lfs merge=lfs -text
-*.model filter=lfs diff=lfs merge=lfs -text
-*.msgpack filter=lfs diff=lfs merge=lfs -text
-*.npy filter=lfs diff=lfs merge=lfs -text
-*.npz filter=lfs diff=lfs merge=lfs -text
-*.onnx filter=lfs diff=lfs merge=lfs -text
-*.ot filter=lfs diff=lfs merge=lfs -text
-*.parquet filter=lfs diff=lfs merge=lfs -text
-*.pb filter=lfs diff=lfs merge=lfs -text
-*.pickle filter=lfs diff=lfs merge=lfs -text
-*.pkl filter=lfs diff=lfs merge=lfs -text
-*.pt filter=lfs diff=lfs merge=lfs -text
-*.pth filter=lfs diff=lfs merge=lfs -text
-*.rar filter=lfs diff=lfs merge=lfs -text
-saved_model/**/* filter=lfs diff=lfs merge=lfs -text
-*.tar.* filter=lfs diff=lfs merge=lfs -text
-*.tflite filter=lfs diff=lfs merge=lfs -text
-*.tgz filter=lfs diff=lfs merge=lfs -text
-*.wasm filter=lfs diff=lfs merge=lfs -text
-*.xz filter=lfs diff=lfs merge=lfs -text
-*.zip filter=lfs diff=lfs merge=lfs -text
-*.zst filter=lfs diff=lfs merge=lfs -text
-*tfevents* filter=lfs diff=lfs merge=lfs -text
-diffusion_pytorch_model.safetensors filter=lfs diff=lfs merge=lfs -text

ckpt/sd-vae-ft-mse/README.md DELETED Viewed

@@ -1,83 +0,0 @@
----
-license: mit
-tags:
-- stable-diffusion
-- stable-diffusion-diffusers
-inference: false
----
-# Improved Autoencoders
-## Utilizing
-These weights are intended to be used with the [🧨 diffusers library](https://github.com/huggingface/diffusers). If you are looking for the model to use with the original [CompVis Stable Diffusion codebase](https://github.com/CompVis/stable-diffusion), [come here](https://huggingface.co/stabilityai/sd-vae-ft-mse-original).
-#### How to use with 🧨 diffusers
-You can integrate this fine-tuned VAE decoder to your existing `diffusers` workflows, by including a `vae` argument to the `StableDiffusionPipeline`
-```py
-from diffusers.models import AutoencoderKL
-from diffusers import StableDiffusionPipeline
-model = "CompVis/stable-diffusion-v1-4"
-vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse")
-pipe = StableDiffusionPipeline.from_pretrained(model, vae=vae)
-```
-## Decoder Finetuning
-We publish two kl-f8 autoencoder versions, finetuned from the original [kl-f8 autoencoder](https://github.com/CompVis/latent-diffusion#pretrained-autoencoding-models) on a 1:1 ratio of [LAION-Aesthetics](https://laion.ai/blog/laion-aesthetics/) and LAION-Humans, an unreleased subset containing only SFW images of humans. The intent was to fine-tune on the Stable Diffusion training set (the autoencoder was originally trained on OpenImages) but also enrich the dataset with images of humans to improve the reconstruction of faces.
-The first, _ft-EMA_, was resumed from the original checkpoint, trained for 313198 steps and uses EMA weights. It uses the same loss configuration as the original checkpoint (L1 + LPIPS).
-The second, _ft-MSE_, was resumed from _ft-EMA_ and uses EMA weights and was trained for another 280k steps using a different loss, with more emphasis
-on MSE reconstruction (MSE + 0.1 * LPIPS). It produces somewhat ``smoother'' outputs. The batch size for both versions was 192 (16 A100s, batch size 12 per GPU).
-To keep compatibility with existing models, only the decoder part was finetuned; the checkpoints can be used as a drop-in replacement for the existing autoencoder.
-_Original kl-f8 VAE vs f8-ft-EMA vs f8-ft-MSE_
-## Evaluation
-### COCO 2017 (256x256, val, 5000 images)
-| Model    | train steps | rFID | PSNR         | SSIM          | PSIM          | Link                                                                              | Comments
-|----------|---------|------|--------------|---------------|---------------|-----------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------|
-|          |         |      |              |               |               |                                                                                   |                                                                                                 |
-| original | 246803        | 4.99 | 23.4 +/- 3.8 | 0.69 +/- 0.14 | 1.01 +/- 0.28 | https://ommer-lab.com/files/latent-diffusion/kl-f8.zip                            | as used in SD                                                                                   |
-| ft-EMA   | 560001        | 4.42 | 23.8 +/- 3.9 | 0.69 +/- 0.13 | 0.96 +/- 0.27 | https://huggingface.co/stabilityai/sd-vae-ft-ema-original/resolve/main/vae-ft-ema-560000-ema-pruned.ckpt | slightly better overall, with EMA                                                               |
-| ft-MSE   | 840001        | 4.70 | 24.5 +/- 3.7 | 0.71 +/- 0.13 | 0.92 +/- 0.27 | https://huggingface.co/stabilityai/sd-vae-ft-mse-original/resolve/main/vae-ft-mse-840000-ema-pruned.ckpt | resumed with EMA from ft-EMA, emphasis on MSE (rec. loss = MSE + 0.1 * LPIPS), smoother outputs |
-### LAION-Aesthetics 5+ (256x256, subset, 10000 images)
-| Model    | train steps | rFID | PSNR         | SSIM          | PSIM          | Link                                                                              | Comments
-|----------|-----------|------|--------------|---------------|---------------|-----------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------|
-|          |           |      |              |               |               |                                                                                   |                                                                                                 |
-| original | 246803         | 2.61 | 26.0 +/- 4.4 | 0.81 +/- 0.12 | 0.75 +/- 0.36 | https://ommer-lab.com/files/latent-diffusion/kl-f8.zip                            | as used in SD                                                                                   |
-| ft-EMA   | 560001          | 1.77 | 26.7 +/- 4.8 | 0.82 +/- 0.12 | 0.67 +/- 0.34 | https://huggingface.co/stabilityai/sd-vae-ft-ema-original/resolve/main/vae-ft-ema-560000-ema-pruned.ckpt | slightly better overall, with EMA                                                               |
-| ft-MSE   | 840001          | 1.88 | 27.3 +/- 4.7 | 0.83 +/- 0.11 | 0.65 +/- 0.34 | https://huggingface.co/stabilityai/sd-vae-ft-mse-original/resolve/main/vae-ft-mse-840000-ema-pruned.ckpt | resumed with EMA from ft-EMA, emphasis on MSE (rec. loss = MSE + 0.1 * LPIPS), smoother outputs |
-### Visual
-_Visualization of reconstructions on  256x256 images from the COCO2017 validation dataset._
-<p align="center">
-  <br>
-  <b>
-256x256: ft-EMA (left), ft-MSE (middle), original (right)</b>
-</p>
-<p align="center">
-<img src=https://huggingface.co/stabilityai/stable-diffusion-decoder-finetune/resolve/main/eval/ae-decoder-tuning-reconstructions/merged/00025_merged.png />
-</p>
-<p align="center">
-<img src=https://huggingface.co/stabilityai/stable-diffusion-decoder-finetune/resolve/main/eval/ae-decoder-tuning-reconstructions/merged/00011_merged.png />
-</p>
-<p align="center">
-<img src=https://huggingface.co/stabilityai/stable-diffusion-decoder-finetune/resolve/main/eval/ae-decoder-tuning-reconstructions/merged/00037_merged.png />
-</p>
-<p align="center">
-<img src=https://huggingface.co/stabilityai/stable-diffusion-decoder-finetune/resolve/main/eval/ae-decoder-tuning-reconstructions/merged/00043_merged.png />
-</p>
-<p align="center">
-<img src=https://huggingface.co/stabilityai/stable-diffusion-decoder-finetune/resolve/main/eval/ae-decoder-tuning-reconstructions/merged/00053_merged.png />
-</p>
-<p align="center">
-<img src=https://huggingface.co/stabilityai/stable-diffusion-decoder-finetune/resolve/main/eval/ae-decoder-tuning-reconstructions/merged/00029_merged.png />
-</p>

ckpt/sd-vae-ft-mse/config.json DELETED Viewed

@@ -1,29 +0,0 @@
-{
-  "_class_name": "AutoencoderKL",
-  "_diffusers_version": "0.4.2",
-  "act_fn": "silu",
-  "block_out_channels": [
-    128,
-    256,
-    512,
-    512
-  ],
-  "down_block_types": [
-    "DownEncoderBlock2D",
-    "DownEncoderBlock2D",
-    "DownEncoderBlock2D",
-    "DownEncoderBlock2D"
-  ],
-  "in_channels": 3,
-  "latent_channels": 4,
-  "layers_per_block": 2,
-  "norm_num_groups": 32,
-  "out_channels": 3,
-  "sample_size": 256,
-  "up_block_types": [
-    "UpDecoderBlock2D",
-    "UpDecoderBlock2D",
-    "UpDecoderBlock2D",
-    "UpDecoderBlock2D"
-  ]
-}

ckpt/sd-vae-ft-mse/diffusion_pytorch_model.bin DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:1b4889b6b1d4ce7ae320a02dedaeff1780ad77d415ea0d744b476155c6377ddc
-size 334707217

ckpt/sd-vae-ft-mse/diffusion_pytorch_model.safetensors DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:a1d993488569e928462932c8c38a0760b874d166399b14414135bd9c42df5815
-size 334643276

ckpt/text_encoder/config.json DELETED Viewed

@@ -1,25 +0,0 @@
-{
-  "_name_or_path": "openai/clip-vit-large-patch14",
-  "architectures": [
-    "CLIPTextModel"
-  ],
-  "attention_dropout": 0.0,
-  "bos_token_id": 0,
-  "dropout": 0.0,
-  "eos_token_id": 2,
-  "hidden_act": "quick_gelu",
-  "hidden_size": 768,
-  "initializer_factor": 1.0,
-  "initializer_range": 0.02,
-  "intermediate_size": 3072,
-  "layer_norm_eps": 1e-05,
-  "max_position_embeddings": 77,
-  "model_type": "clip_text_model",
-  "num_attention_heads": 12,
-  "num_hidden_layers": 12,
-  "pad_token_id": 1,
-  "projection_dim": 768,
-  "torch_dtype": "float32",
-  "transformers_version": "4.30.2",
-  "vocab_size": 49408
-}

ckpt/text_encoder/model.safetensors DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:8ee67788c3c9a6f5d73de077b644f1d4317258b55fbcc372dc385e8e5587c1cc
-size 492265880

ckpt/tokenizer/merges.txt DELETED Viewed

The diff for this file is too large to render. See raw diff

ckpt/tokenizer/special_tokens_map.json DELETED Viewed

@@ -1,24 +0,0 @@
-{
-  "bos_token": {
-    "content": "<|startoftext|>",
-    "lstrip": false,
-    "normalized": true,
-    "rstrip": false,
-    "single_word": false
-  },
-  "eos_token": {
-    "content": "<|endoftext|>",
-    "lstrip": false,
-    "normalized": true,
-    "rstrip": false,
-    "single_word": false
-  },
-  "pad_token": "<|endoftext|>",
-  "unk_token": {
-    "content": "<|endoftext|>",
-    "lstrip": false,
-    "normalized": true,
-    "rstrip": false,
-    "single_word": false
-  }
-}

ckpt/tokenizer/tokenizer_config.json DELETED Viewed

@@ -1,33 +0,0 @@
-{
-  "add_prefix_space": false,
-  "bos_token": {
-    "__type": "AddedToken",
-    "content": "<|startoftext|>",
-    "lstrip": false,
-    "normalized": true,
-    "rstrip": false,
-    "single_word": false
-  },
-  "clean_up_tokenization_spaces": true,
-  "do_lower_case": true,
-  "eos_token": {
-    "__type": "AddedToken",
-    "content": "<|endoftext|>",
-    "lstrip": false,
-    "normalized": true,
-    "rstrip": false,
-    "single_word": false
-  },
-  "errors": "replace",
-  "model_max_length": 77,
-  "pad_token": "<|endoftext|>",
-  "tokenizer_class": "CLIPTokenizer",
-  "unk_token": {
-    "__type": "AddedToken",
-    "content": "<|endoftext|>",
-    "lstrip": false,
-    "normalized": true,
-    "rstrip": false,
-    "single_word": false
-  }
-}

ckpt/tokenizer/vocab.json DELETED Viewed

The diff for this file is too large to render. See raw diff

ckpt/unet/config.json DELETED Viewed

@@ -1,60 +0,0 @@
-{
-  "_class_name": "UNet2DConditionModel",
-  "_diffusers_version": "0.16.1",
-  "act_fn": "silu",
-  "addition_embed_type": null,
-  "addition_embed_type_num_heads": 64,
-  "attention_head_dim": 8,
-  "block_out_channels": [
-    320,
-    640,
-    1280,
-    1280
-  ],
-  "center_input_sample": false,
-  "class_embed_type": null,
-  "class_embeddings_concat": false,
-  "conv_in_kernel": 3,
-  "conv_out_kernel": 3,
-  "cross_attention_dim": 768,
-  "cross_attention_norm": null,
-  "down_block_types": [
-    "CrossAttnDownBlock2D",
-    "CrossAttnDownBlock2D",
-    "CrossAttnDownBlock2D",
-    "DownBlock2D"
-  ],
-  "downsample_padding": 1,
-  "dual_cross_attention": false,
-  "encoder_hid_dim": null,
-  "flip_sin_to_cos": true,
-  "freq_shift": 0,
-  "in_channels": 4,
-  "layers_per_block": 2,
-  "mid_block_only_cross_attention": null,
-  "mid_block_scale_factor": 1,
-  "mid_block_type": "UNetMidBlock2DCrossAttn",
-  "norm_eps": 1e-05,
-  "norm_num_groups": 32,
-  "num_class_embeds": null,
-  "only_cross_attention": false,
-  "out_channels": 4,
-  "projection_class_embeddings_input_dim": null,
-  "resnet_out_scale_factor": 1.0,
-  "resnet_skip_time_act": false,
-  "resnet_time_scale_shift": "default",
-  "sample_size": 64,
-  "time_cond_proj_dim": null,
-  "time_embedding_act_fn": null,
-  "time_embedding_dim": null,
-  "time_embedding_type": "positional",
-  "timestep_post_act": null,
-  "up_block_types": [
-    "UpBlock2D",
-    "CrossAttnUpBlock2D",
-    "CrossAttnUpBlock2D",
-    "CrossAttnUpBlock2D"
-  ],
-  "upcast_attention": false,
-  "use_linear_projection": false
-}

ckpt/unet/diffusion_pytorch_model.safetensors DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:f75956623c8f95b40e62b3ee45f5dcba8e353b53c33c6765e517c5a8bb3dfbfe
-size 3438167536