gaunernst
/

vit_base_patch16_1024_128.audiomae_as2m_ft_as20k

Audio Classification

Model card Files Files and versions Community

gaunernst commited on Mar 6, 2024

Commit

8dd1730

·

verified ·

1 Parent(s): 59db684

fix typo

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -8,7 +8,7 @@ pipeline_tag: audio-classification
 A Vision Transformer (ViT) for audio. Pretrained on AudioSet-2M with Self-Supervised Masked Autoencoder (MAE) method, and fine-tuned on AudioSet-20k.
-- This is a port of AudioMAE ViT-B/32 weights for usage with `timm`. The naming convention is adopted from other `timm`'s ViT models.
 - See the original repo here: https://github.com/facebookresearch/AudioMAE
 - For the AudioSet-2M pre-trained checkpoint (without Audioset-20k fine-tuning), see https://huggingface.co/gaunernst/vit_base_patch16_1024_128.audiomae_as2m

 A Vision Transformer (ViT) for audio. Pretrained on AudioSet-2M with Self-Supervised Masked Autoencoder (MAE) method, and fine-tuned on AudioSet-20k.
+- This is a port of AudioMAE ViT-B/16 weights for usage with `timm`. The naming convention is adopted from other `timm`'s ViT models.
 - See the original repo here: https://github.com/facebookresearch/AudioMAE
 - For the AudioSet-2M pre-trained checkpoint (without Audioset-20k fine-tuning), see https://huggingface.co/gaunernst/vit_base_patch16_1024_128.audiomae_as2m