This model is a fine-tuned version of facebook/deit-tiny-distilled-patch16-224 on the docornot dataset.

It achieves the following results on the evaluation set:

  • Loss: 0.0000
  • Accuracy: 1.0

CO2 emissions

This model was trained on an M1 and took 0.322 g of CO2 (measured with CodeCarbon)

Model description

This model is distilled Vision Transformer (ViT) model. Images are presented to the model as a sequence of fixed-size patches (resolution 16x16), which are linearly embedded.

Intended uses & limitations

You can use this model to detect if an image is a picture or a document.

Training procedure

Source code used to generate this model : https://github.com/mozilla/docornot

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Accuracy
0.0 1.0 1600 0.0000 1.0

Framework versions

  • Transformers 4.39.2
  • Pytorch 2.2.2
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
45
Safetensors
Model size
5.53M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Mozilla/docornot

Quantized
(3)
this model

Dataset used to train Mozilla/docornot