keras-io
/

deit

Model card Files Files and versions Metrics Training metrics Community

deit / README.md

Wauplin's picture

Wauplin HF staff

Set `library_name` to `tf-keras`.

d7251be verified 6 months ago

|

2.31 kB

	---
	library_name: tf-keras
	---

	## Model description
	This model is implementation of the distillation recipe proposed in DeiT.
	Visit Keras example on [Distilling Vision Transformers](https://keras.io/examples/vision/deit/).

	Full credits to: [Sayak Paul](https://twitter.com/RisingSayak)

	In the original Vision Transformers (ViT) paper (Dosovitskiy et al.), the authors concluded that to perform on par with Convolutional Neural Networks (CNNs), ViTs need to be pre-trained on larger datasets. The larger the better. This is mainly due to the lack of inductive biases in the ViT architecture -- unlike CNNs, they don't have layers that exploit locality.

	Many groups have proposed different ways to deal with the problem of data-intensiveness of ViT training. One such way was shown in the Data-efficient image Transformers, (DeiT) paper (Touvron et al.). The authors introduced a distillation technique that is specific to transformer-based vision models. DeiT is among the first works to show that it's possible to train ViTs well without using larger datasets.

	## Intended uses & limitations

	The model is trained for demonstrative purposes and does not guarantee the best results in production.
	For better results, follow & optimize the [Keras example](https://keras.io/examples/vision/deit/) as per your need.

	## Training and evaluation data

	The model is trained and evaluated on [TF Flowers dataset](https://www.tensorflow.org/datasets/catalog/tf_flowers)

	## Training procedure

	Training procedure is followed exactly as from the [keras example](https://keras.io/examples/vision/deit/).
	The batch size is however decreased to 16 from the original 256 for accomodating the model in a single V100 GPU memory.

	### Training hyperparameters

	The following hyperparameters were used during training:

	\| name \| learning_rate \| decay \| beta_1 \| beta_2 \| epsilon \| amsgrad \| weight_decay \| exclude_from_weight_decay \| training_precision \|
	\|----\|-------------\|-----\|------\|------\|-------\|-------\|------------\|-------------------------\|------------------\|
	\|AdamW\|6.25000029685907e-05\|0.0\|0.8999999761581421\|0.9990000128746033\|1e-07\|False\|9.999999747378752e-05\|None\|float32\|

	## Model Plot

	<details>
	<summary>View Model Plot</summary>

	![Model Image](./model.png)

	</details>