metadata

datasets:
  - zalando-datasets/fashion_mnist
language:
  - en
metrics:
  - accuracy
pipeline_tag: image-classification
tags:
  - fashion
  - clothes
  - fashion_mnist
  - CNN
  - Classification

BeitForImageClassification

Model Structure

BeitModel

Embeddings: BeitEmbeddings
- Uses patch embeddings with a Conv2d layer (3 input channels, 768 output channels, kernel size 16x16, stride 16x16).
- Includes a dropout layer with probability 0.0.
Encoder: BeitEncoder
- Contains 12 BeitLayer modules.
- Each BeitLayer includes:
  - Attention: BeitAttention
    - BeitSelfAttention with linear layers for query, key, and value, dropout, and relative position bias.
    - BeitSelfOutput with a linear layer and dropout.
  - Intermediate: BeitIntermediate
    - Dense layer increasing dimensions from 768 to 3072, followed by GELU activation.
  - Output: BeitOutput
    - Dense layer reducing dimensions back to 768, with dropout.
  - LayerNorm applied before and after main operations.
  - Drop Path mechanism with varying probability across layers.
Pooler: BeitPooler
- Contains a layer normalization.

Classifier: Linear

Linear layer mapping 768-dimensional embeddings to 10 output classes.

Detected Classes

The model has been trained to detect the following classes:

T-shirt / top
Trouser
Pullover
Dress
Coat
Sandal
Shirt
Sneaker
Bag
Ankle boot