pictero / README.md
nickyreinert-vml
adding troubleshooting notes for black image
a265c47
|
raw
history blame
5.1 kB
metadata
title: Pictero.com
emoji: 💻
colorFrom: gray
colorTo: yellow
sdk: gradio
sdk_version: 4.25.0
app_file: app.py
pinned: false

This space run on a free HF space and therefore only provides small CPU power. To run it on your own hardware, run it locally.

Local installation

First clone the source code:

git clone https://huggingface.co/spaces/n42/pictero

Then install all required libraries, preferably inside a virtual environment:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Then run the web app either via Python itself or through Gradio:

python app.py

You can use Gradio to run the web app which offers a file watcher in case you make changes to the sources. This will hot-reload the app:

gradio app.py

Walk-Through

Patience is king

If you start the process for a given model the first time, it may take a while, because the backend needs to download the full model. Depending on the model, this requires multiple GigaByte of space on your device. This only happens once, except on Huggingface, where the server cache will be purged.

Steps

This describes the minimal required steps to utilize this interface. Most default parameters don't need to be changed in order to work properly.

  1. Select a device. A decent gpu is recommended.
  2. Choose a model. If a fine-tuned model requires a trigger token, it will be added automatically. The safety checker options allows you to not render nsfw content, but in some cases this also produces black images for not harmful content.
  3. You may select a different scheduler. The scheduler controls quality and performance.
  4. Now you can define your prompt and negative prompt.
  5. The value for inference steps controls how many iteration your generation process will run. The higher this value, the longer the process takes and the better the image quality is. You should start with a lower value to see how the model interpretes your prompt. As soon as you got a satisfying result, increase this value to produce high quality output.
  6. The manual seed is a way to either force randomisastion or creation of the same output every time you run theprocess. Keep this field empty to enable random outputs.
  7. Use guidance scale to define how strict the model interprets your prompt.
  8. Hit run!

Hints

The re-run button runs the process again and only applies changes you made to the Inference settings section. While the run button execute the whole process from the scratch.

You have two options to persist your selected configuration: Either you copy the code to an environment where you can execute Python code (Google Colab). Or, after every succesful run, head to the bottom of the page. There's a table containing a link to this interface containing the whole configuration.

Areas

Model specific settings

This allows you to select any model hosted on Huggingface. Some models are fine-tuned and require a trigger token to be activated, like https://huggingface.co/sd-dreambooth-library/herge-style.

Refiner is a way to improve the quality of your image by re-processing it a second time.

The pipeline supports a way to prevent nswf-content to be created. I figured this does not always work properly, so those to options allow you to disable this feature.

Attention slicing divides attention operation into multiple steps, instead of one huge step. On machines with memory below 64 GByte or for images bigger than 512x512 pixels, this may increase performance drastically. On Apple's Silicon (M1, M1), it's recommend to keep this setting enabled. See https://huggingface.co/docs/diffusers/optimization/mps

Scheduler/Solver

This is the part of the process, that manipulates the output from the model every loop/epoch.

Auto Encoder

The auto encoder is responsible for the encoding and decoding process from the input to the output. VAE slicing and VAE tiling are parameters to improve performance here.

Adapters

Adapters allow you to modify or control the output, e.g. apply specific styles. This interface supports Textual inversion and LoRA

Customization

Update the file appConfig.json to add more models. Some models need you to accept their license agreement before you can access them, like https://huggingface.co/stabilityai/stable-diffusion-3-medium.

Troubleshooting

Result is a black image

Some parameters lead to black images, deactivate them, one by one, and re-run the whole process:

  • safety checker
  • a or the wrong auto encoder
  • the wrong schedulder/solver (e.g. DPMMultiStep seems to be incompatible with SD15, better use DDPMScheduler)

Also I faced a couple of bugs when using the attention_slicing method (see https://discuss.huggingface.co/t/activating-attention-slicing-leads-to-black-images-when-running-diffusion-more-than-once/68623):

  • you cannot re-run the inferencing process when using attention_slicing
  • don't pass cross_attention_kwargs or guidance_scale to the pipeline