--- name: Stable Diffusion Model description: Text-to-image generative model using PyTorch and Hugging Face Diffusers version: 1.0.0 license: apache-2.0 authors: - name: Maneesh Singh url: https://github.com/Maneesh-Singh123 tags: - text-to-image - generative-model - pytorch - hugging-face - diffusers model-type: latent-diffusion task: image-generation dataset: various metrics: - psnr - ssim - frechet-inception-distance parameters: - learning-rate: 5e-5 - batch-size: 8 - num-epochs: 10 - num-inference-steps: 50 --- # Stable Diffusion Model - PyTorch & Hugging Face Diffusers This repository contains the implementation of the **Stable Diffusion** model using **PyTorch** and **Hugging Face Diffusers**. Stable Diffusion is a text-to-image generative model that leverages a diffusion process to generate high-quality, detailed images from textual descriptions. ## Table of Contents - [Installation](#installation) - [Usage](#usage) - [Model Overview](#model-overview) - [Training](#training) - [Inference](#inference) - [Examples](#examples) - [Acknowledgments](#acknowledgments) ## Installation To get started, you'll need to clone this repository and install the required dependencies. We recommend using a virtual environment to avoid conflicts. ```bash git clone https://github.com/the-antique-piece/stable_diffusion.git cd stable_diffusion # Create and activate a virtual environment (optional) python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate` # Install required dependencies pip install -r requirements.txt ``` ### Requirements - Python 3.8+ - PyTorch - Hugging Face Diffusers - Transformers - Datasets - PIL To install all dependencies manually, you can run: ```bash pip install torch diffusers transformers datasets pillow flax ``` ## Model Overview Stable Diffusion is a **latent diffusion model** that is trained to denoise a latent representation of the image, conditioned on a text prompt. It operates by gradually reversing a noise process applied to the data during training, allowing it to generate images starting from pure noise. This repository implements the following features: - **Text-to-image generation**: Generate images based on a text prompt. - **Fine-tuning**: Customize the model for specific datasets. - **Inference**: Run the model on pre-trained weights for fast image generation. ### Model Architecture The Stable Diffusion model consists of: 1. **Variational Autoencoder (VAE)** - Encodes images into latent space. 2. **U-Net** - A denoising network that learns to reverse the noise process. 3. **Text Encoder** - Encodes text prompts into latent space to guide image generation. ## Usage ### Text-to-Image Generation Once the environment is set up, you can generate images from text prompts as follows: ```python from diffusers import DiffusionPipeline import torch # Remove torch_dtype=torch.float16 pipeline = DiffusionPipeline.from_pretrained("stable-diffusion/stable-diffusion-v1") # Use a Nvidia GPU if available, or else cpu device = torch.device("cuda" if torch.cuda.is_available() else "cpu") pipeline.to(device) pipeline("An image of futuristic city where everyting is perfect").images[0] ``` ### To get more control over image generation create seperate python file and paste this code and run it from virtual environment using 'python python_script.py' **If You don't running this model on nvidia GPU change torch_type=torch.float32** ```python from diffusers import DiffusionPipeline import torch # Provide a path to directory where the model_index.json is placed weights_path = "directory_path_to_model_index.json" pipeline = DiffusionPipeline.from_pretrained(weights_path, torch_dtype=torch.float16) # You can change prompt to get different photos, increase inference_steps's value to get high quality images prompt = 'a cat sitting on a windowsill, looking at the sunset' height, width = 512, 512 num_inference_steps = 50 image = pipeline(prompt, height=height, width=width, num_inference_steps=num_inference_steps).images[0] image.save("myimage.png") ``` ### Custom Model Weights If you have custom model weights, load them into the pipeline: ```python pipe = StableDiffusionPipeline.from_pretrained("path/to/your/model").to("cuda") ``` ## Training This repository also supports fine-tuning the Stable Diffusion model on your own dataset. To prepare for training: 1. **Prepare Dataset**: Ensure that your dataset is in a format compatible with Hugging Face's `datasets` library. 2. **Configure Training Parameters**: Adjust hyperparameters such as learning rate, batch size, and number of epochs. ### Fine-tuning Example ```bash python train.py --dataset_path /path/to/dataset --output_dir /path/to/output --batch_size 8 --learning_rate 5e-5 --num_epochs 10 ``` Training can be done with the `train.py` script, which supports distributed training for large datasets and multiple GPUs. ## Inference To run inference on a trained model, use the `inference.py` script: ```bash python inference.py --model_path /path/to/trained/model --prompt "a futuristic city skyline at sunset" ``` ## Examples Here are some example prompts and the corresponding generated images: - **Prompt**: "a cat sitting on a windowsill, looking at the sunset" ![Example Image 1](generated_images/image8.png) - **Prompt**: "a futuristic cityscape with flying cars" ![Example Image 2](generated_images/image7.png) ## Acknowledgments This implementation is based on the **Stable Diffusion** model by [Huggingface](https://github.com/huggingface/diffusers.git) and utilizes the Huggingface **Diffusers** library. ## License This project is licensed under the terms of the [Apache 2.0 License](LICENSE).