goldpotatoes's picture
Create README.md
d8f28bf verified
|
raw
history blame
1.46 kB
metadata
datasets:
  - zalando-datasets/fashion_mnist
language:
  - en
metrics:
  - accuracy
pipeline_tag: image-classification
tags:
  - fashion
  - clothes
  - fashion_mnist
  - CNN
  - Classification

BeitForImageClassification

Model Structure

BeitModel

  • Embeddings: BeitEmbeddings

    • Uses patch embeddings with a Conv2d layer (3 input channels, 768 output channels, kernel size 16x16, stride 16x16).
    • Includes a dropout layer with probability 0.0.
  • Encoder: BeitEncoder

    • Contains 12 BeitLayer modules.
    • Each BeitLayer includes:
      • Attention: BeitAttention
        • BeitSelfAttention with linear layers for query, key, and value, dropout, and relative position bias.
        • BeitSelfOutput with a linear layer and dropout.
      • Intermediate: BeitIntermediate
        • Dense layer increasing dimensions from 768 to 3072, followed by GELU activation.
      • Output: BeitOutput
        • Dense layer reducing dimensions back to 768, with dropout.
      • LayerNorm applied before and after main operations.
      • Drop Path mechanism with varying probability across layers.
  • Pooler: BeitPooler

    • Contains a layer normalization.

Classifier: Linear

  • Linear layer mapping 768-dimensional embeddings to 10 output classes.

Detected Classes

The model has been trained to detect the following classes:

  1. T-shirt / top
  2. Trouser
  3. Pullover
  4. Dress
  5. Coat
  6. Sandal
  7. Shirt
  8. Sneaker
  9. Bag
  10. Ankle boot