fix typo
Browse files
README.md
CHANGED
@@ -8,7 +8,7 @@ pipeline_tag: audio-classification
|
|
8 |
|
9 |
A Vision Transformer (ViT) for audio. Pretrained on AudioSet-2M with Self-Supervised Masked Autoencoder (MAE) method, and fine-tuned on AudioSet-20k.
|
10 |
|
11 |
-
- This is a port of AudioMAE ViT-B/
|
12 |
- See the original repo here: https://github.com/facebookresearch/AudioMAE
|
13 |
- For the AudioSet-2M pre-trained checkpoint (without Audioset-20k fine-tuning), see https://huggingface.co/gaunernst/vit_base_patch16_1024_128.audiomae_as2m
|
14 |
|
|
|
8 |
|
9 |
A Vision Transformer (ViT) for audio. Pretrained on AudioSet-2M with Self-Supervised Masked Autoencoder (MAE) method, and fine-tuned on AudioSet-20k.
|
10 |
|
11 |
+
- This is a port of AudioMAE ViT-B/16 weights for usage with `timm`. The naming convention is adopted from other `timm`'s ViT models.
|
12 |
- See the original repo here: https://github.com/facebookresearch/AudioMAE
|
13 |
- For the AudioSet-2M pre-trained checkpoint (without Audioset-20k fine-tuning), see https://huggingface.co/gaunernst/vit_base_patch16_1024_128.audiomae_as2m
|
14 |
|