metadata
datasets:
- zalando-datasets/fashion_mnist
language:
- en
metrics:
- accuracy
pipeline_tag: image-classification
tags:
- fashion
- clothes
- fashion_mnist
- CNN
- Classification
BeitForImageClassification
Model Structure
BeitModel
Embeddings: BeitEmbeddings
- Uses patch embeddings with a
Conv2d
layer (3 input channels, 768 output channels, kernel size 16x16, stride 16x16). - Includes a dropout layer with probability 0.0.
- Uses patch embeddings with a
Encoder: BeitEncoder
- Contains 12
BeitLayer
modules. - Each
BeitLayer
includes:- Attention: BeitAttention
BeitSelfAttention
with linear layers for query, key, and value, dropout, and relative position bias.BeitSelfOutput
with a linear layer and dropout.
- Intermediate: BeitIntermediate
- Dense layer increasing dimensions from 768 to 3072, followed by GELU activation.
- Output: BeitOutput
- Dense layer reducing dimensions back to 768, with dropout.
- LayerNorm applied before and after main operations.
- Drop Path mechanism with varying probability across layers.
- Attention: BeitAttention
- Contains 12
Pooler: BeitPooler
- Contains a layer normalization.
Classifier: Linear
- Linear layer mapping 768-dimensional embeddings to 10 output classes.
Detected Classes
The model has been trained to detect the following classes:
- T-shirt / top
- Trouser
- Pullover
- Dress
- Coat
- Sandal
- Shirt
- Sneaker
- Bag
- Ankle boot