---
name: Stable Diffusion Model
description: Text-to-image generative model using PyTorch and Hugging Face Diffusers
version: 1.0.0
license: apache-2.0
authors:
  - name: Maneesh Singh
    url: https://github.com/Maneesh-Singh123
tags:
  - text-to-image
  - generative-model
  - pytorch
  - hugging-face
  - diffusers
model-type: latent-diffusion
task: image-generation
dataset: various
metrics:
  - psnr
  - ssim
  - frechet-inception-distance
parameters:
  - learning-rate: 5e-5
  - batch-size: 8
  - num-epochs: 10
  - num-inference-steps: 50
---


# Stable Diffusion Model - PyTorch & Hugging Face Diffusers

This repository contains the implementation of the **Stable Diffusion** model using **PyTorch** and **Hugging Face Diffusers**. Stable Diffusion is a text-to-image generative model that leverages a diffusion process to generate high-quality, detailed images from textual descriptions.

## Table of Contents
- [Installation](#installation)
- [Usage](#usage)
- [Model Overview](#model-overview)
- [Training](#training)
- [Inference](#inference)
- [Examples](#examples)
- [Acknowledgments](#acknowledgments)

## Installation

To get started, you'll need to clone this repository and install the required dependencies. We recommend using a virtual environment to avoid conflicts.

```bash
git clone https://github.com/the-antique-piece/stable_diffusion.git
cd stable_diffusion

# Create and activate a virtual environment (optional)
python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

# Install required dependencies
pip install -r requirements.txt
```

### Requirements

- Python 3.8+
- PyTorch
- Hugging Face Diffusers
- Transformers
- Datasets
- PIL

To install all dependencies manually, you can run:

```bash
pip install torch diffusers transformers datasets pillow flax
```

## Model Overview

Stable Diffusion is a **latent diffusion model** that is trained to denoise a latent representation of the image, conditioned on a text prompt. It operates by gradually reversing a noise process applied to the data during training, allowing it to generate images starting from pure noise.

This repository implements the following features:
- **Text-to-image generation**: Generate images based on a text prompt.
- **Fine-tuning**: Customize the model for specific datasets.
- **Inference**: Run the model on pre-trained weights for fast image generation.

### Model Architecture

The Stable Diffusion model consists of:
1. **Variational Autoencoder (VAE)** - Encodes images into latent space.
2. **U-Net** - A denoising network that learns to reverse the noise process.
3. **Text Encoder** - Encodes text prompts into latent space to guide image generation.

## Usage

### Text-to-Image Generation

Once the environment is set up, you can generate images from text prompts as follows:

```python
from diffusers import DiffusionPipeline
import torch

# Remove torch_dtype=torch.float16
pipeline = DiffusionPipeline.from_pretrained("stable-diffusion/stable-diffusion-v1")

# Use a Nvidia GPU if available, or else cpu
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
pipeline.to(device)

pipeline("An image of futuristic city where everyting is perfect").images[0]
```
### To get more control over image generation create seperate python file and paste this code and run it from virtual environment using 'python python_script.py'
**If You don't running this model on nvidia GPU change torch_type=torch.float32**

```python 
from diffusers import DiffusionPipeline
import torch
# Provide a path to directory where the model_index.json is placed
weights_path = "directory_path_to_model_index.json"
pipeline = DiffusionPipeline.from_pretrained(weights_path, torch_dtype=torch.float16)
# You can change prompt to get different photos, increase inference_steps's value to get high quality images
prompt = 'a cat sitting on a windowsill, looking at the sunset'
height, width = 512, 512
num_inference_steps = 50

image = pipeline(prompt, height=height, width=width, num_inference_steps=num_inference_steps).images[0]
image.save("myimage.png")
```
### Custom Model Weights

If you have custom model weights, load them into the pipeline:

```python
pipe = StableDiffusionPipeline.from_pretrained("path/to/your/model").to("cuda")
```

## Training

This repository also supports fine-tuning the Stable Diffusion model on your own dataset. To prepare for training:

1. **Prepare Dataset**: Ensure that your dataset is in a format compatible with Hugging Face's `datasets` library.
2. **Configure Training Parameters**: Adjust hyperparameters such as learning rate, batch size, and number of epochs.

### Fine-tuning Example

```bash
python train.py --dataset_path /path/to/dataset --output_dir /path/to/output --batch_size 8 --learning_rate 5e-5 --num_epochs 10
```

Training can be done with the `train.py` script, which supports distributed training for large datasets and multiple GPUs.

## Inference

To run inference on a trained model, use the `inference.py` script:

```bash
python inference.py --model_path /path/to/trained/model --prompt "a futuristic city skyline at sunset"
```

## Examples

Here are some example prompts and the corresponding generated images:

- **Prompt**: "a cat sitting on a windowsill, looking at the sunset"
  ![Example Image 1](generated_images/image8.png)

- **Prompt**: "a futuristic cityscape with flying cars"
  ![Example Image 2](generated_images/image7.png)

## Acknowledgments

This implementation is based on the **Stable Diffusion** model by [Huggingface](https://github.com/huggingface/diffusers.git) and utilizes the Huggingface **Diffusers** library.

## License

This project is licensed under the terms of the [Apache 2.0 License](LICENSE).