Model description

Image classification with ConvMixer

Keras Example Link

In the Patches Are All You Need paper, the authors extend the idea of using patches to train an all-convolutional network and demonstrate competitive results. Their architecture namely ConvMixer uses recipes from the recent isotrophic architectures like ViT, MLP-Mixer (Tolstikhin et al.), such as using the same depth and resolution across different layers in the network, residual connections, and so on.

ConvMixer is very similar to the MLP-Mixer, model with the following key differences: Instead of using fully-connected layers, it uses standard convolution layers. Instead of LayerNorm (which is typical for ViTs and MLP-Mixers), it uses BatchNorm.

Full Credits to Sayak Paul for this work.

Intended uses & limitations

More information needed

Training and evaluation data

Trained and evaluated on CIFAR-10 dataset.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

name learning_rate decay beta_1 beta_2 epsilon amsgrad weight_decay exclude_from_weight_decay training_precision
AdamW 0.0010000000474974513 0.0 0.8999999761581421 0.9990000128746033 1e-07 False 9.999999747378752e-05 None float32

Training Metrics

Model history needed

Model Plot

View Model Plot

Model Image

Downloads last month
1
Inference Examples
Inference API (serverless) does not yet support tf-keras models for this pipeline type.

Spaces using keras-io/conv_mixer_image_classification 2