Image Feature Extraction

Masked Autoencoders (MAE) with PCA-Based Variance Filtering

Overview

This repository contains models trained using Masked Autoencoders (MAE) on CIFAR-10 and STL-10 datasets. The models are trained under different settings, focusing on the reconstruction of images based on principal component analysis (PCA) components with varying explained variances. The goal is to explore how concentrating on low-variance image components can improve representation learning. To better understand about low-variance components and high-variance components of a dataset, you can check the idea which is obtained from the paper "Learning by Reconstruction Produces Uninformative Features For Perception" by Randall Balestriero and Yann LeCun. The paper is available at arXiv.

Models Included:

  • No Mode: MAE trained on original images.
  • Bottom 25% Variance: MAE trained to reconstruct images using components with the lowest 25% variance.
  • Bottom 10% Variance: MAE trained to reconstruct images using components with the lowest 10% variance.
  • Top 75% Variance: MAE trained to reconstruct images using components with the highest 75% variance.
  • Top 60% Variance: MAE trained to reconstruct images using components with the highest 60% variance.

Model Details

Dataset

  • CIFAR-10 and STL-10: Both datasets were used for training the models. They consist of images across various classes, and PCA was performed on these datasets to extract components of different variances.

Training Procedure

  • PCA Application: PCA was applied to the dataset images to separate components by explained variance.
  • MAE Training:
    • No Mode: Standard MAE training on the original images.
    • Bottom 25% Variance: MAE model trained to reconstruct images using only the bottom 25% variance components.
    • Top 75% Variance: MAE model trained to reconstruct images using the top 75% variance components.

Evaluation

  • Fine-Tuning: The pre-trained models were fine-tuned for classification tasks.
  • Linear Probing: The models' representations were evaluated using linear probing to assess their quality.

Results

The models demonstrate varying performance based on the variance components they focused on during training:

  • Bottom 25% Variance: Expected to yield better representations, especially for detailed and nuanced image features.
  • Top 75% Variance: Expected to perform worse due to the focus on broader, less informative features.

Usage

You can download and use these models trained with Cifar10 dataset directly from the respective folders:

How to Use

To use the pre-trained models, you can load them as follows:

import torch

model = torch.load('path_to_model.pt')
model.eval()

# Example usage
output = model(input_data)

Additional Resources

  • GitHub Repository: For training scripts and further details, visit the GitHub Repository.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .

Datasets used to train turhancan97/MAE-Models

Space using turhancan97/MAE-Models 1