File size: 3,407 Bytes

---
tags:
- autoencoder
- image-colorization
- pytorch
- pytorch_model_hub_mixin
license: apache-2.0
datasets:
- flwrlabs/celeba
language:
- en
metrics:
- mse
pipeline_tag: image-to-image
---

# Model Colorization Autoencoder

## Model Description

This autoencoder model is designed for image colorization. It takes grayscale images as input and outputs colorized versions of those images. The model architecture consists of an encoder-decoder structure, where the encoder compresses the input image into a latent representation, and the decoder reconstructs the image in color.

### Architecture

- **Encoder**: The encoder comprises three convolutional layers followed by max pooling and ReLU activations, each paired with batch normalization. It ends with a flattening layer and a fully connected layer to produce a latent vector.
- **Decoder**: The decoder mirrors the encoder, using linear and transposed convolutional layers with ReLU activations and batch normalization. The final layer outputs a color image using a sigmoid activation function.

The architecture details are as follows:
```python
class ModelColorization(nn.Module, PyTorchModelHubMixin):
    def __init__(self):
        super(ModelColorization, self).__init__()
        self.encoder = nn.Sequential(
            nn.Conv2d(1, 64, kernel_size=3, stride=1, padding=1),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.ReLU(),
            nn.BatchNorm2d(64),
            nn.Conv2d(64, 32, kernel_size=3, stride=1, padding=1),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.ReLU(),
            nn.BatchNorm2d(32),
            nn.Conv2d(32, 16, kernel_size=3, stride=1, padding=1),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.ReLU(),
            nn.BatchNorm2d(16),
            nn.Flatten(),
            nn.Linear(16*45*45, 4000),
        )
        self.decoder = nn.Sequential(
            nn.Linear(4000, 16 * 45 * 45),
            nn.ReLU(),
            nn.Unflatten(1, (16, 45, 45)),
            nn.ConvTranspose2d(16, 32, kernel_size=3, stride=2, padding=1, output_padding=1),
            nn.ReLU(),
            nn.BatchNorm2d(32),
            nn.ConvTranspose2d(32, 64, kernel_size=3, stride=2, padding=1, output_padding=1),
            nn.ReLU(),
            nn.BatchNorm2d(64),
            nn.ConvTranspose2d(64, 3, kernel_size=3, stride=2, padding=1, output_padding=1),
            nn.Sigmoid()
        )

    def forward(self, x):
        x = self.encoder(x)
        x = self.decoder(x)
        return x

```

### Training Details
The model was trained using PyTorch for 5 epochs. Here are the training and validation losses observed during the training:

Epoch 1: Training Loss: 0.0063, Validation Loss: 0.0042

Epoch 2: Training Loss: 0.0036, Validation Loss: 0.0035

Epoch 3: Training Loss: 0.0032, Validation Loss: 0.0032

Epoch 4: Training Loss: 0.0030, Validation Loss: 0.0030

Epoch 5: Training Loss: 0.0029, Validation Loss: 0.0030

The model demonstrated continuous improvement in reducing both training and validation loss over the epochs.

### Usage
You can load the model from the Hugging Face Hub using the following code:

```python
# Ensure you have the necessary dependencies installed:
pip install torch torchvision transformers

from transformers import AutoModel

model = AutoModel.from_pretrained("sebastiansarasti/AutoEncoderImageColorization")
```