ImageNet Results
In our ImageNet experiment, we aimed to assess the performance of Mice ViTs on a more complex and diverse dataset, ImageNet. We trained mice ViTs on classifying the 1000 ImageNet classes.
Training Details
Similar to the dSprites experiment, for each attention layer setting, we explored two model variants: an attention-only model and a model combining attention with the MLP module. Dropout and layer normalization were not applied for simplicity. The detailed training logs and metrics can be found here.
Table of Results
Below table describe the accuracy [ <Acc> | <Top5 Acc> ]
of Mice ViTs with different configurations.
Size | NumLayers | Attention+MLP | AttentionOnly | Model Link |
---|---|---|---|---|
tiny | 1 | 0.16 | 0.33 | 0.11 | 0.25 | AttentionOnly, Attention+MLP |
base | 2 | 0.23 | 0.44 | 0.16 | 0.34 | AttentionOnly, Attention+MLP |
small | 3 | 0.28 | 0.51 | 0.17 | 0.35 | AttentionOnly, Attention+MLP |
medium | 4 | 0.33 | 0.56 | 0.17 | 0.36 | AttentionOnly, Attention+MLP |