namangarg110
/

hiera_base_224

Image Feature Extraction

Inference Endpoints

Model card Files Files and versions Community

namangarg110 commited on Feb 28

Commit

ff0af2d

•

1 Parent(s): 7070aaf

Update README.md

Files changed (1) hide show

README.md +10 -0

README.md CHANGED Viewed

@@ -1,3 +1,13 @@
 ---
 license: cc-by-nc-4.0
 ---

 ---
 license: cc-by-nc-4.0
 ---
+# Hiera (hiera_base_224)
+Hiera is a hierarchical transformer that is a much more efficient alternative to previous series of hierarchical transformers (ConvNeXT and Swin).
+Vanilla transformer architectures (Dosovitskiy et al. 2020) are very popular yet simple and scalable architectures that enable pretraining strategies such as MAE (He et al., 2022).
+However, they use the same spatial resolution and number of channels throughout the network, ViTs make inefficient use of their parameters. This
+is in contrast to prior “hierarchical” or “multi-scale” models (e.g., Krizhevsky et al. (2012); He et al. (2016)), which use fewer channels but higher spatial resolution in early stages
+with simpler features, and more channels but lower spatial resolution later in the model with more complex features.
+These models are way too complex though which add overhead operations to achieve state-of-the-art accuracy in ImageNet-1k, making the model slower.
+Hiera attempts to address this issue by teaching the model spatial biases by training MAE.
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/6141a88b3a0ec78603c9e784/ogkud4qc564bPX3f0bGXO.png)