patrickvonplaten
commited on
Commit
·
97d26f9
1
Parent(s):
5b9ecc0
correct vocab
Browse files
README.md
CHANGED
@@ -1,6 +1,1496 @@
|
|
1 |
---
|
2 |
tags:
|
3 |
- mms
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
---
|
5 |
|
6 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
tags:
|
3 |
- mms
|
4 |
+
language:
|
5 |
+
- ab
|
6 |
+
- af
|
7 |
+
- ak
|
8 |
+
- am
|
9 |
+
- ar
|
10 |
+
- as
|
11 |
+
- av
|
12 |
+
- ay
|
13 |
+
- az
|
14 |
+
- ba
|
15 |
+
- bm
|
16 |
+
- be
|
17 |
+
- bn
|
18 |
+
- bi
|
19 |
+
- bo
|
20 |
+
- sh
|
21 |
+
- br
|
22 |
+
- bg
|
23 |
+
- ca
|
24 |
+
- cs
|
25 |
+
- ce
|
26 |
+
- cv
|
27 |
+
- ku
|
28 |
+
- cy
|
29 |
+
- da
|
30 |
+
- de
|
31 |
+
- dv
|
32 |
+
- dz
|
33 |
+
- el
|
34 |
+
- en
|
35 |
+
- eo
|
36 |
+
- et
|
37 |
+
- eu
|
38 |
+
- ee
|
39 |
+
- fo
|
40 |
+
- fa
|
41 |
+
- fj
|
42 |
+
- fi
|
43 |
+
- fr
|
44 |
+
- fy
|
45 |
+
- ff
|
46 |
+
- ga
|
47 |
+
- gl
|
48 |
+
- gn
|
49 |
+
- gu
|
50 |
+
- zh
|
51 |
+
- ht
|
52 |
+
- ha
|
53 |
+
- he
|
54 |
+
- hi
|
55 |
+
- sh
|
56 |
+
- hu
|
57 |
+
- hy
|
58 |
+
- ig
|
59 |
+
- ia
|
60 |
+
- ms
|
61 |
+
- is
|
62 |
+
- it
|
63 |
+
- jv
|
64 |
+
- ja
|
65 |
+
- kn
|
66 |
+
- ka
|
67 |
+
- kk
|
68 |
+
- kr
|
69 |
+
- km
|
70 |
+
- ki
|
71 |
+
- rw
|
72 |
+
- ky
|
73 |
+
- ko
|
74 |
+
- kv
|
75 |
+
- lo
|
76 |
+
- la
|
77 |
+
- lv
|
78 |
+
- ln
|
79 |
+
- lt
|
80 |
+
- lb
|
81 |
+
- lg
|
82 |
+
- mh
|
83 |
+
- ml
|
84 |
+
- mr
|
85 |
+
- ms
|
86 |
+
- mk
|
87 |
+
- mg
|
88 |
+
- mt
|
89 |
+
- mn
|
90 |
+
- mi
|
91 |
+
- my
|
92 |
+
- zh
|
93 |
+
- nl
|
94 |
+
- 'no'
|
95 |
+
- 'no'
|
96 |
+
- ne
|
97 |
+
- ny
|
98 |
+
- oc
|
99 |
+
- om
|
100 |
+
- or
|
101 |
+
- os
|
102 |
+
- pa
|
103 |
+
- pl
|
104 |
+
- pt
|
105 |
+
- ms
|
106 |
+
- ps
|
107 |
+
- qu
|
108 |
+
- qu
|
109 |
+
- qu
|
110 |
+
- qu
|
111 |
+
- qu
|
112 |
+
- qu
|
113 |
+
- qu
|
114 |
+
- qu
|
115 |
+
- qu
|
116 |
+
- qu
|
117 |
+
- qu
|
118 |
+
- qu
|
119 |
+
- qu
|
120 |
+
- qu
|
121 |
+
- qu
|
122 |
+
- qu
|
123 |
+
- qu
|
124 |
+
- qu
|
125 |
+
- qu
|
126 |
+
- qu
|
127 |
+
- qu
|
128 |
+
- qu
|
129 |
+
- ro
|
130 |
+
- rn
|
131 |
+
- ru
|
132 |
+
- sg
|
133 |
+
- sk
|
134 |
+
- sl
|
135 |
+
- sm
|
136 |
+
- sn
|
137 |
+
- sd
|
138 |
+
- so
|
139 |
+
- es
|
140 |
+
- sq
|
141 |
+
- su
|
142 |
+
- sv
|
143 |
+
- sw
|
144 |
+
- ta
|
145 |
+
- tt
|
146 |
+
- te
|
147 |
+
- tg
|
148 |
+
- tl
|
149 |
+
- th
|
150 |
+
- ti
|
151 |
+
- ts
|
152 |
+
- tr
|
153 |
+
- uk
|
154 |
+
- ms
|
155 |
+
- vi
|
156 |
+
- wo
|
157 |
+
- xh
|
158 |
+
- ms
|
159 |
+
- yo
|
160 |
+
- ms
|
161 |
+
- zu
|
162 |
+
- za
|
163 |
+
license: cc-by-sa-4.0
|
164 |
+
datasets:
|
165 |
+
- google/fleurs
|
166 |
+
metrics:
|
167 |
+
- wer
|
168 |
---
|
169 |
|
170 |
+
# Massively Multilingual Speech (MMS) - Finetuned ASR - ALL
|
171 |
+
|
172 |
+
This checkpoint is a model fine-tuned for multi-lingual ASR and part of Facebook's [Massive Multilingual Speech project](https://research.facebook.com/publications/scaling-speech-technology-to-1000-languages/).
|
173 |
+
This checkpoint is based on the [Wav2Vec2 architecture](https://huggingface.co/docs/transformers/model_doc/wav2vec2) and makes use of adapter models to transcribe 1000+ languages.
|
174 |
+
The checkpoint consists of **1 billion parameters** and has been fine-tuned from [facebook/mms-1b](https://huggingface.co/facebook/mms-1b) on 1162 languages.
|
175 |
+
|
176 |
+
## Table Of Content
|
177 |
+
|
178 |
+
- [Example](#example)
|
179 |
+
- [Supported Languages](#supported-languages)
|
180 |
+
- [Model details](#model-details)
|
181 |
+
- [Additional links](#additional-links)
|
182 |
+
|
183 |
+
## Example
|
184 |
+
|
185 |
+
This MMS checkpoint can be used with [Transformers](https://github.com/huggingface/transformers) to transcribe audio of 1107 different
|
186 |
+
languages. Let's look at a simple example.
|
187 |
+
|
188 |
+
First, we install transformers and some other libraries
|
189 |
+
```
|
190 |
+
pip install torch accelerate torchaudio datasets
|
191 |
+
pip install --upgrade transformers
|
192 |
+
````
|
193 |
+
|
194 |
+
**Note**: In order to use MMS you need to have at least `transformers >= 4.30` installed. If the `4.30` version
|
195 |
+
is not yet available [on PyPI](https://pypi.org/project/transformers/) make sure to install `transformers` from
|
196 |
+
source:
|
197 |
+
```
|
198 |
+
pip install git+https://github.com/huggingface/transformers.git
|
199 |
+
```
|
200 |
+
|
201 |
+
Next, we load a couple of audio samples via `datasets`. Make sure that the audio data is sampled to 16000 kHz.
|
202 |
+
|
203 |
+
```py
|
204 |
+
from datasets import load_dataset, Audio
|
205 |
+
|
206 |
+
# English
|
207 |
+
stream_data = load_dataset("mozilla-foundation/common_voice_13_0", "en", split="test", streaming=True)
|
208 |
+
stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000))
|
209 |
+
en_sample = next(iter(stream_data))["audio"]["array"]
|
210 |
+
|
211 |
+
# French
|
212 |
+
stream_data = load_dataset("mozilla-foundation/common_voice_13_0", "fr", split="test", streaming=True)
|
213 |
+
stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000))
|
214 |
+
fr_sample = next(iter(stream_data))["audio"]["array"]
|
215 |
+
```
|
216 |
+
|
217 |
+
Next, we load the model and processor
|
218 |
+
|
219 |
+
```py
|
220 |
+
from transformers import Wav2Vec2ForCTC, AutoProcessor
|
221 |
+
import torch
|
222 |
+
|
223 |
+
model_id = "facebook/mms-1b-all"
|
224 |
+
|
225 |
+
processor = AutoProcessor.from_pretrained(model_id)
|
226 |
+
model = Wav2Vec2ForCTC.from_pretrained(model_id)
|
227 |
+
```
|
228 |
+
|
229 |
+
Now we process the audio data, pass the processed audio data to the model and transcribe the model output, just like we usually do for Wav2Vec2 models such as [facebook/wav2vec2-base-960h](https://huggingface.co/facebook/wav2vec2-base-960h)
|
230 |
+
|
231 |
+
```py
|
232 |
+
inputs = processor(en_sample, sampling_rate=16_000, return_tensors="pt")
|
233 |
+
|
234 |
+
with torch.no_grad():
|
235 |
+
outputs = model(**inputs).logits
|
236 |
+
|
237 |
+
ids = torch.argmax(outputs, dim=-1)[0]
|
238 |
+
transcription = processor.decode(ids)
|
239 |
+
# 'joe keton disapproved of films and buster also had reservations about the media'
|
240 |
+
```
|
241 |
+
|
242 |
+
We can now keep the same model in memory and simply switch out the language adapters by calling the convenient [`load_adapter()`]() function for the model and [`set_target_lang()`]() for the tokenizer. We pass the target language as an input - "fra" for French.
|
243 |
+
|
244 |
+
```py
|
245 |
+
processor.tokenizer.set_target_lang("fra")
|
246 |
+
model.load_adapter("fra")
|
247 |
+
|
248 |
+
inputs = processor(fr_sample, sampling_rate=16_000, return_tensors="pt")
|
249 |
+
|
250 |
+
with torch.no_grad():
|
251 |
+
outputs = model(**inputs).logits
|
252 |
+
|
253 |
+
ids = torch.argmax(outputs, dim=-1)[0]
|
254 |
+
transcription = processor.decode(ids)
|
255 |
+
# "ce dernier est volé tout au long de l'histoire romaine"
|
256 |
+
```
|
257 |
+
|
258 |
+
In the same way the language can be switched out for all other supported languages. Please have a look at:
|
259 |
+
```py
|
260 |
+
processor.tokenizer.vocab.keys()
|
261 |
+
```
|
262 |
+
|
263 |
+
For more details, please have a look at [the official docs](https://huggingface.co/docs/transformers/main/en/model_doc/mms).
|
264 |
+
|
265 |
+
## Supported Languages
|
266 |
+
|
267 |
+
This model supports 1162 languages. Unclick the following to toogle all supported languages of this checkpoint in [ISO 639-3 code](https://en.wikipedia.org/wiki/ISO_639-3).
|
268 |
+
You can find more details about the languages and their ISO 649-3 codes in the [MMS Language Coverage Overview](https://dl.fbaipublicfiles.com/mms/misc/language_coverage_mms.html).
|
269 |
+
<details>
|
270 |
+
<summary>Click to toggle</summary>
|
271 |
+
|
272 |
+
- abi
|
273 |
+
- abk
|
274 |
+
- abp
|
275 |
+
- aca
|
276 |
+
- acd
|
277 |
+
- ace
|
278 |
+
- acf
|
279 |
+
- ach
|
280 |
+
- acn
|
281 |
+
- acr
|
282 |
+
- acu
|
283 |
+
- ade
|
284 |
+
- adh
|
285 |
+
- adj
|
286 |
+
- adx
|
287 |
+
- aeu
|
288 |
+
- afr
|
289 |
+
- agd
|
290 |
+
- agg
|
291 |
+
- agn
|
292 |
+
- agr
|
293 |
+
- agu
|
294 |
+
- agx
|
295 |
+
- aha
|
296 |
+
- ahk
|
297 |
+
- aia
|
298 |
+
- aka
|
299 |
+
- akb
|
300 |
+
- ake
|
301 |
+
- akp
|
302 |
+
- alj
|
303 |
+
- alp
|
304 |
+
- alt
|
305 |
+
- alz
|
306 |
+
- ame
|
307 |
+
- amf
|
308 |
+
- amh
|
309 |
+
- ami
|
310 |
+
- amk
|
311 |
+
- ann
|
312 |
+
- any
|
313 |
+
- aoz
|
314 |
+
- apb
|
315 |
+
- apr
|
316 |
+
- ara
|
317 |
+
- arl
|
318 |
+
- asa
|
319 |
+
- asg
|
320 |
+
- asm
|
321 |
+
- ast
|
322 |
+
- ata
|
323 |
+
- atb
|
324 |
+
- atg
|
325 |
+
- ati
|
326 |
+
- atq
|
327 |
+
- ava
|
328 |
+
- avn
|
329 |
+
- avu
|
330 |
+
- awa
|
331 |
+
- awb
|
332 |
+
- ayo
|
333 |
+
- ayr
|
334 |
+
- ayz
|
335 |
+
- azb
|
336 |
+
- azg
|
337 |
+
- azj-script_cyrillic
|
338 |
+
- azj-script_latin
|
339 |
+
- azz
|
340 |
+
- bak
|
341 |
+
- bam
|
342 |
+
- ban
|
343 |
+
- bao
|
344 |
+
- bas
|
345 |
+
- bav
|
346 |
+
- bba
|
347 |
+
- bbb
|
348 |
+
- bbc
|
349 |
+
- bbo
|
350 |
+
- bcc-script_arabic
|
351 |
+
- bcc-script_latin
|
352 |
+
- bcl
|
353 |
+
- bcw
|
354 |
+
- bdg
|
355 |
+
- bdh
|
356 |
+
- bdq
|
357 |
+
- bdu
|
358 |
+
- bdv
|
359 |
+
- beh
|
360 |
+
- bel
|
361 |
+
- bem
|
362 |
+
- ben
|
363 |
+
- bep
|
364 |
+
- bex
|
365 |
+
- bfa
|
366 |
+
- bfo
|
367 |
+
- bfy
|
368 |
+
- bfz
|
369 |
+
- bgc
|
370 |
+
- bgq
|
371 |
+
- bgr
|
372 |
+
- bgt
|
373 |
+
- bgw
|
374 |
+
- bha
|
375 |
+
- bht
|
376 |
+
- bhz
|
377 |
+
- bib
|
378 |
+
- bim
|
379 |
+
- bis
|
380 |
+
- biv
|
381 |
+
- bjr
|
382 |
+
- bjv
|
383 |
+
- bjw
|
384 |
+
- bjz
|
385 |
+
- bkd
|
386 |
+
- bkv
|
387 |
+
- blh
|
388 |
+
- blt
|
389 |
+
- blx
|
390 |
+
- blz
|
391 |
+
- bmq
|
392 |
+
- bmr
|
393 |
+
- bmu
|
394 |
+
- bmv
|
395 |
+
- bng
|
396 |
+
- bno
|
397 |
+
- bnp
|
398 |
+
- boa
|
399 |
+
- bod
|
400 |
+
- boj
|
401 |
+
- bom
|
402 |
+
- bor
|
403 |
+
- bos
|
404 |
+
- bov
|
405 |
+
- box
|
406 |
+
- bpr
|
407 |
+
- bps
|
408 |
+
- bqc
|
409 |
+
- bqi
|
410 |
+
- bqj
|
411 |
+
- bqp
|
412 |
+
- bre
|
413 |
+
- bru
|
414 |
+
- bsc
|
415 |
+
- bsq
|
416 |
+
- bss
|
417 |
+
- btd
|
418 |
+
- bts
|
419 |
+
- btt
|
420 |
+
- btx
|
421 |
+
- bud
|
422 |
+
- bul
|
423 |
+
- bus
|
424 |
+
- bvc
|
425 |
+
- bvz
|
426 |
+
- bwq
|
427 |
+
- bwu
|
428 |
+
- byr
|
429 |
+
- bzh
|
430 |
+
- bzi
|
431 |
+
- bzj
|
432 |
+
- caa
|
433 |
+
- cab
|
434 |
+
- cac-dialect_sanmateoixtatan
|
435 |
+
- cac-dialect_sansebastiancoatan
|
436 |
+
- cak-dialect_central
|
437 |
+
- cak-dialect_santamariadejesus
|
438 |
+
- cak-dialect_santodomingoxenacoj
|
439 |
+
- cak-dialect_southcentral
|
440 |
+
- cak-dialect_western
|
441 |
+
- cak-dialect_yepocapa
|
442 |
+
- cap
|
443 |
+
- car
|
444 |
+
- cas
|
445 |
+
- cat
|
446 |
+
- cax
|
447 |
+
- cbc
|
448 |
+
- cbi
|
449 |
+
- cbr
|
450 |
+
- cbs
|
451 |
+
- cbt
|
452 |
+
- cbu
|
453 |
+
- cbv
|
454 |
+
- cce
|
455 |
+
- cco
|
456 |
+
- cdj
|
457 |
+
- ceb
|
458 |
+
- ceg
|
459 |
+
- cek
|
460 |
+
- ces
|
461 |
+
- cfm
|
462 |
+
- cgc
|
463 |
+
- che
|
464 |
+
- chf
|
465 |
+
- chv
|
466 |
+
- chz
|
467 |
+
- cjo
|
468 |
+
- cjp
|
469 |
+
- cjs
|
470 |
+
- ckb
|
471 |
+
- cko
|
472 |
+
- ckt
|
473 |
+
- cla
|
474 |
+
- cle
|
475 |
+
- cly
|
476 |
+
- cme
|
477 |
+
- cmn-script_simplified
|
478 |
+
- cmo-script_khmer
|
479 |
+
- cmo-script_latin
|
480 |
+
- cmr
|
481 |
+
- cnh
|
482 |
+
- cni
|
483 |
+
- cnl
|
484 |
+
- cnt
|
485 |
+
- coe
|
486 |
+
- cof
|
487 |
+
- cok
|
488 |
+
- con
|
489 |
+
- cot
|
490 |
+
- cou
|
491 |
+
- cpa
|
492 |
+
- cpb
|
493 |
+
- cpu
|
494 |
+
- crh
|
495 |
+
- crk-script_latin
|
496 |
+
- crk-script_syllabics
|
497 |
+
- crn
|
498 |
+
- crq
|
499 |
+
- crs
|
500 |
+
- crt
|
501 |
+
- csk
|
502 |
+
- cso
|
503 |
+
- ctd
|
504 |
+
- ctg
|
505 |
+
- cto
|
506 |
+
- ctu
|
507 |
+
- cuc
|
508 |
+
- cui
|
509 |
+
- cuk
|
510 |
+
- cul
|
511 |
+
- cwa
|
512 |
+
- cwe
|
513 |
+
- cwt
|
514 |
+
- cya
|
515 |
+
- cym
|
516 |
+
- daa
|
517 |
+
- dah
|
518 |
+
- dan
|
519 |
+
- dar
|
520 |
+
- dbj
|
521 |
+
- dbq
|
522 |
+
- ddn
|
523 |
+
- ded
|
524 |
+
- des
|
525 |
+
- deu
|
526 |
+
- dga
|
527 |
+
- dgi
|
528 |
+
- dgk
|
529 |
+
- dgo
|
530 |
+
- dgr
|
531 |
+
- dhi
|
532 |
+
- did
|
533 |
+
- dig
|
534 |
+
- dik
|
535 |
+
- dip
|
536 |
+
- div
|
537 |
+
- djk
|
538 |
+
- dnj-dialect_blowowest
|
539 |
+
- dnj-dialect_gweetaawueast
|
540 |
+
- dnt
|
541 |
+
- dnw
|
542 |
+
- dop
|
543 |
+
- dos
|
544 |
+
- dsh
|
545 |
+
- dso
|
546 |
+
- dtp
|
547 |
+
- dts
|
548 |
+
- dug
|
549 |
+
- dwr
|
550 |
+
- dyi
|
551 |
+
- dyo
|
552 |
+
- dyu
|
553 |
+
- dzo
|
554 |
+
- eip
|
555 |
+
- eka
|
556 |
+
- ell
|
557 |
+
- emp
|
558 |
+
- enb
|
559 |
+
- eng
|
560 |
+
- enx
|
561 |
+
- epo
|
562 |
+
- ese
|
563 |
+
- ess
|
564 |
+
- est
|
565 |
+
- eus
|
566 |
+
- evn
|
567 |
+
- ewe
|
568 |
+
- eza
|
569 |
+
- fal
|
570 |
+
- fao
|
571 |
+
- far
|
572 |
+
- fas
|
573 |
+
- fij
|
574 |
+
- fin
|
575 |
+
- flr
|
576 |
+
- fmu
|
577 |
+
- fon
|
578 |
+
- fra
|
579 |
+
- frd
|
580 |
+
- fry
|
581 |
+
- ful
|
582 |
+
- gag-script_cyrillic
|
583 |
+
- gag-script_latin
|
584 |
+
- gai
|
585 |
+
- gam
|
586 |
+
- gau
|
587 |
+
- gbi
|
588 |
+
- gbk
|
589 |
+
- gbm
|
590 |
+
- gbo
|
591 |
+
- gde
|
592 |
+
- geb
|
593 |
+
- gej
|
594 |
+
- gil
|
595 |
+
- gjn
|
596 |
+
- gkn
|
597 |
+
- gld
|
598 |
+
- gle
|
599 |
+
- glg
|
600 |
+
- glk
|
601 |
+
- gmv
|
602 |
+
- gna
|
603 |
+
- gnd
|
604 |
+
- gng
|
605 |
+
- gof-script_latin
|
606 |
+
- gog
|
607 |
+
- gor
|
608 |
+
- gqr
|
609 |
+
- grc
|
610 |
+
- gri
|
611 |
+
- grn
|
612 |
+
- grt
|
613 |
+
- gso
|
614 |
+
- gub
|
615 |
+
- guc
|
616 |
+
- gud
|
617 |
+
- guh
|
618 |
+
- guj
|
619 |
+
- guk
|
620 |
+
- gum
|
621 |
+
- guo
|
622 |
+
- guq
|
623 |
+
- guu
|
624 |
+
- gux
|
625 |
+
- gvc
|
626 |
+
- gvl
|
627 |
+
- gwi
|
628 |
+
- gwr
|
629 |
+
- gym
|
630 |
+
- gyr
|
631 |
+
- had
|
632 |
+
- hag
|
633 |
+
- hak
|
634 |
+
- hap
|
635 |
+
- hat
|
636 |
+
- hau
|
637 |
+
- hay
|
638 |
+
- heb
|
639 |
+
- heh
|
640 |
+
- hif
|
641 |
+
- hig
|
642 |
+
- hil
|
643 |
+
- hin
|
644 |
+
- hlb
|
645 |
+
- hlt
|
646 |
+
- hne
|
647 |
+
- hnn
|
648 |
+
- hns
|
649 |
+
- hoc
|
650 |
+
- hoy
|
651 |
+
- hrv
|
652 |
+
- hsb
|
653 |
+
- hto
|
654 |
+
- hub
|
655 |
+
- hui
|
656 |
+
- hun
|
657 |
+
- hus-dialect_centralveracruz
|
658 |
+
- hus-dialect_westernpotosino
|
659 |
+
- huu
|
660 |
+
- huv
|
661 |
+
- hvn
|
662 |
+
- hwc
|
663 |
+
- hye
|
664 |
+
- hyw
|
665 |
+
- iba
|
666 |
+
- ibo
|
667 |
+
- icr
|
668 |
+
- idd
|
669 |
+
- ifa
|
670 |
+
- ifb
|
671 |
+
- ife
|
672 |
+
- ifk
|
673 |
+
- ifu
|
674 |
+
- ify
|
675 |
+
- ign
|
676 |
+
- ikk
|
677 |
+
- ilb
|
678 |
+
- ilo
|
679 |
+
- imo
|
680 |
+
- ina
|
681 |
+
- inb
|
682 |
+
- ind
|
683 |
+
- iou
|
684 |
+
- ipi
|
685 |
+
- iqw
|
686 |
+
- iri
|
687 |
+
- irk
|
688 |
+
- isl
|
689 |
+
- ita
|
690 |
+
- itl
|
691 |
+
- itv
|
692 |
+
- ixl-dialect_sangasparchajul
|
693 |
+
- ixl-dialect_sanjuancotzal
|
694 |
+
- ixl-dialect_santamarianebaj
|
695 |
+
- izr
|
696 |
+
- izz
|
697 |
+
- jac
|
698 |
+
- jam
|
699 |
+
- jav
|
700 |
+
- jbu
|
701 |
+
- jen
|
702 |
+
- jic
|
703 |
+
- jiv
|
704 |
+
- jmc
|
705 |
+
- jmd
|
706 |
+
- jpn
|
707 |
+
- jun
|
708 |
+
- juy
|
709 |
+
- jvn
|
710 |
+
- kaa
|
711 |
+
- kab
|
712 |
+
- kac
|
713 |
+
- kak
|
714 |
+
- kam
|
715 |
+
- kan
|
716 |
+
- kao
|
717 |
+
- kaq
|
718 |
+
- kat
|
719 |
+
- kay
|
720 |
+
- kaz
|
721 |
+
- kbo
|
722 |
+
- kbp
|
723 |
+
- kbq
|
724 |
+
- kbr
|
725 |
+
- kby
|
726 |
+
- kca
|
727 |
+
- kcg
|
728 |
+
- kdc
|
729 |
+
- kde
|
730 |
+
- kdh
|
731 |
+
- kdi
|
732 |
+
- kdj
|
733 |
+
- kdl
|
734 |
+
- kdn
|
735 |
+
- kdt
|
736 |
+
- kea
|
737 |
+
- kek
|
738 |
+
- ken
|
739 |
+
- keo
|
740 |
+
- ker
|
741 |
+
- key
|
742 |
+
- kez
|
743 |
+
- kfb
|
744 |
+
- kff-script_telugu
|
745 |
+
- kfw
|
746 |
+
- kfx
|
747 |
+
- khg
|
748 |
+
- khm
|
749 |
+
- khq
|
750 |
+
- kia
|
751 |
+
- kij
|
752 |
+
- kik
|
753 |
+
- kin
|
754 |
+
- kir
|
755 |
+
- kjb
|
756 |
+
- kje
|
757 |
+
- kjg
|
758 |
+
- kjh
|
759 |
+
- kki
|
760 |
+
- kkj
|
761 |
+
- kle
|
762 |
+
- klu
|
763 |
+
- klv
|
764 |
+
- klw
|
765 |
+
- kma
|
766 |
+
- kmd
|
767 |
+
- kml
|
768 |
+
- kmr-script_arabic
|
769 |
+
- kmr-script_cyrillic
|
770 |
+
- kmr-script_latin
|
771 |
+
- kmu
|
772 |
+
- knb
|
773 |
+
- kne
|
774 |
+
- knf
|
775 |
+
- knj
|
776 |
+
- knk
|
777 |
+
- kno
|
778 |
+
- kog
|
779 |
+
- kor
|
780 |
+
- kpq
|
781 |
+
- kps
|
782 |
+
- kpv
|
783 |
+
- kpy
|
784 |
+
- kpz
|
785 |
+
- kqe
|
786 |
+
- kqp
|
787 |
+
- kqr
|
788 |
+
- kqy
|
789 |
+
- krc
|
790 |
+
- kri
|
791 |
+
- krj
|
792 |
+
- krl
|
793 |
+
- krr
|
794 |
+
- krs
|
795 |
+
- kru
|
796 |
+
- ksb
|
797 |
+
- ksr
|
798 |
+
- kss
|
799 |
+
- ktb
|
800 |
+
- ktj
|
801 |
+
- kub
|
802 |
+
- kue
|
803 |
+
- kum
|
804 |
+
- kus
|
805 |
+
- kvn
|
806 |
+
- kvw
|
807 |
+
- kwd
|
808 |
+
- kwf
|
809 |
+
- kwi
|
810 |
+
- kxc
|
811 |
+
- kxf
|
812 |
+
- kxm
|
813 |
+
- kxv
|
814 |
+
- kyb
|
815 |
+
- kyc
|
816 |
+
- kyf
|
817 |
+
- kyg
|
818 |
+
- kyo
|
819 |
+
- kyq
|
820 |
+
- kyu
|
821 |
+
- kyz
|
822 |
+
- kzf
|
823 |
+
- lac
|
824 |
+
- laj
|
825 |
+
- lam
|
826 |
+
- lao
|
827 |
+
- las
|
828 |
+
- lat
|
829 |
+
- lav
|
830 |
+
- law
|
831 |
+
- lbj
|
832 |
+
- lbw
|
833 |
+
- lcp
|
834 |
+
- lee
|
835 |
+
- lef
|
836 |
+
- lem
|
837 |
+
- lew
|
838 |
+
- lex
|
839 |
+
- lgg
|
840 |
+
- lgl
|
841 |
+
- lhu
|
842 |
+
- lia
|
843 |
+
- lid
|
844 |
+
- lif
|
845 |
+
- lin
|
846 |
+
- lip
|
847 |
+
- lis
|
848 |
+
- lit
|
849 |
+
- lje
|
850 |
+
- ljp
|
851 |
+
- llg
|
852 |
+
- lln
|
853 |
+
- lme
|
854 |
+
- lnd
|
855 |
+
- lns
|
856 |
+
- lob
|
857 |
+
- lok
|
858 |
+
- lom
|
859 |
+
- lon
|
860 |
+
- loq
|
861 |
+
- lsi
|
862 |
+
- lsm
|
863 |
+
- ltz
|
864 |
+
- luc
|
865 |
+
- lug
|
866 |
+
- luo
|
867 |
+
- lwo
|
868 |
+
- lww
|
869 |
+
- lzz
|
870 |
+
- maa-dialect_sanantonio
|
871 |
+
- maa-dialect_sanjeronimo
|
872 |
+
- mad
|
873 |
+
- mag
|
874 |
+
- mah
|
875 |
+
- mai
|
876 |
+
- maj
|
877 |
+
- mak
|
878 |
+
- mal
|
879 |
+
- mam-dialect_central
|
880 |
+
- mam-dialect_northern
|
881 |
+
- mam-dialect_southern
|
882 |
+
- mam-dialect_western
|
883 |
+
- maq
|
884 |
+
- mar
|
885 |
+
- maw
|
886 |
+
- maz
|
887 |
+
- mbb
|
888 |
+
- mbc
|
889 |
+
- mbh
|
890 |
+
- mbj
|
891 |
+
- mbt
|
892 |
+
- mbu
|
893 |
+
- mbz
|
894 |
+
- mca
|
895 |
+
- mcb
|
896 |
+
- mcd
|
897 |
+
- mco
|
898 |
+
- mcp
|
899 |
+
- mcq
|
900 |
+
- mcu
|
901 |
+
- mda
|
902 |
+
- mdf
|
903 |
+
- mdv
|
904 |
+
- mdy
|
905 |
+
- med
|
906 |
+
- mee
|
907 |
+
- mej
|
908 |
+
- men
|
909 |
+
- meq
|
910 |
+
- met
|
911 |
+
- mev
|
912 |
+
- mfe
|
913 |
+
- mfh
|
914 |
+
- mfi
|
915 |
+
- mfk
|
916 |
+
- mfq
|
917 |
+
- mfy
|
918 |
+
- mfz
|
919 |
+
- mgd
|
920 |
+
- mge
|
921 |
+
- mgh
|
922 |
+
- mgo
|
923 |
+
- mhi
|
924 |
+
- mhr
|
925 |
+
- mhu
|
926 |
+
- mhx
|
927 |
+
- mhy
|
928 |
+
- mib
|
929 |
+
- mie
|
930 |
+
- mif
|
931 |
+
- mih
|
932 |
+
- mil
|
933 |
+
- mim
|
934 |
+
- min
|
935 |
+
- mio
|
936 |
+
- mip
|
937 |
+
- miq
|
938 |
+
- mit
|
939 |
+
- miy
|
940 |
+
- miz
|
941 |
+
- mjl
|
942 |
+
- mjv
|
943 |
+
- mkd
|
944 |
+
- mkl
|
945 |
+
- mkn
|
946 |
+
- mlg
|
947 |
+
- mlt
|
948 |
+
- mmg
|
949 |
+
- mnb
|
950 |
+
- mnf
|
951 |
+
- mnk
|
952 |
+
- mnw
|
953 |
+
- mnx
|
954 |
+
- moa
|
955 |
+
- mog
|
956 |
+
- mon
|
957 |
+
- mop
|
958 |
+
- mor
|
959 |
+
- mos
|
960 |
+
- mox
|
961 |
+
- moz
|
962 |
+
- mpg
|
963 |
+
- mpm
|
964 |
+
- mpp
|
965 |
+
- mpx
|
966 |
+
- mqb
|
967 |
+
- mqf
|
968 |
+
- mqj
|
969 |
+
- mqn
|
970 |
+
- mri
|
971 |
+
- mrw
|
972 |
+
- msy
|
973 |
+
- mtd
|
974 |
+
- mtj
|
975 |
+
- mto
|
976 |
+
- muh
|
977 |
+
- mup
|
978 |
+
- mur
|
979 |
+
- muv
|
980 |
+
- muy
|
981 |
+
- mvp
|
982 |
+
- mwq
|
983 |
+
- mwv
|
984 |
+
- mxb
|
985 |
+
- mxq
|
986 |
+
- mxt
|
987 |
+
- mxv
|
988 |
+
- mya
|
989 |
+
- myb
|
990 |
+
- myk
|
991 |
+
- myl
|
992 |
+
- myv
|
993 |
+
- myx
|
994 |
+
- myy
|
995 |
+
- mza
|
996 |
+
- mzi
|
997 |
+
- mzj
|
998 |
+
- mzk
|
999 |
+
- mzm
|
1000 |
+
- mzw
|
1001 |
+
- nab
|
1002 |
+
- nag
|
1003 |
+
- nan
|
1004 |
+
- nas
|
1005 |
+
- naw
|
1006 |
+
- nca
|
1007 |
+
- nch
|
1008 |
+
- ncj
|
1009 |
+
- ncl
|
1010 |
+
- ncu
|
1011 |
+
- ndj
|
1012 |
+
- ndp
|
1013 |
+
- ndv
|
1014 |
+
- ndy
|
1015 |
+
- ndz
|
1016 |
+
- neb
|
1017 |
+
- new
|
1018 |
+
- nfa
|
1019 |
+
- nfr
|
1020 |
+
- nga
|
1021 |
+
- ngl
|
1022 |
+
- ngp
|
1023 |
+
- ngu
|
1024 |
+
- nhe
|
1025 |
+
- nhi
|
1026 |
+
- nhu
|
1027 |
+
- nhw
|
1028 |
+
- nhx
|
1029 |
+
- nhy
|
1030 |
+
- nia
|
1031 |
+
- nij
|
1032 |
+
- nim
|
1033 |
+
- nin
|
1034 |
+
- nko
|
1035 |
+
- nlc
|
1036 |
+
- nld
|
1037 |
+
- nlg
|
1038 |
+
- nlk
|
1039 |
+
- nmz
|
1040 |
+
- nnb
|
1041 |
+
- nno
|
1042 |
+
- nnq
|
1043 |
+
- nnw
|
1044 |
+
- noa
|
1045 |
+
- nob
|
1046 |
+
- nod
|
1047 |
+
- nog
|
1048 |
+
- not
|
1049 |
+
- npi
|
1050 |
+
- npl
|
1051 |
+
- npy
|
1052 |
+
- nso
|
1053 |
+
- nst
|
1054 |
+
- nsu
|
1055 |
+
- ntm
|
1056 |
+
- ntr
|
1057 |
+
- nuj
|
1058 |
+
- nus
|
1059 |
+
- nuz
|
1060 |
+
- nwb
|
1061 |
+
- nxq
|
1062 |
+
- nya
|
1063 |
+
- nyf
|
1064 |
+
- nyn
|
1065 |
+
- nyo
|
1066 |
+
- nyy
|
1067 |
+
- nzi
|
1068 |
+
- obo
|
1069 |
+
- oci
|
1070 |
+
- ojb-script_latin
|
1071 |
+
- ojb-script_syllabics
|
1072 |
+
- oku
|
1073 |
+
- old
|
1074 |
+
- omw
|
1075 |
+
- onb
|
1076 |
+
- ood
|
1077 |
+
- orm
|
1078 |
+
- ory
|
1079 |
+
- oss
|
1080 |
+
- ote
|
1081 |
+
- otq
|
1082 |
+
- ozm
|
1083 |
+
- pab
|
1084 |
+
- pad
|
1085 |
+
- pag
|
1086 |
+
- pam
|
1087 |
+
- pan
|
1088 |
+
- pao
|
1089 |
+
- pap
|
1090 |
+
- pau
|
1091 |
+
- pbb
|
1092 |
+
- pbc
|
1093 |
+
- pbi
|
1094 |
+
- pce
|
1095 |
+
- pcm
|
1096 |
+
- peg
|
1097 |
+
- pez
|
1098 |
+
- pib
|
1099 |
+
- pil
|
1100 |
+
- pir
|
1101 |
+
- pis
|
1102 |
+
- pjt
|
1103 |
+
- pkb
|
1104 |
+
- pls
|
1105 |
+
- plw
|
1106 |
+
- pmf
|
1107 |
+
- pny
|
1108 |
+
- poh-dialect_eastern
|
1109 |
+
- poh-dialect_western
|
1110 |
+
- poi
|
1111 |
+
- pol
|
1112 |
+
- por
|
1113 |
+
- poy
|
1114 |
+
- ppk
|
1115 |
+
- pps
|
1116 |
+
- prf
|
1117 |
+
- prk
|
1118 |
+
- prt
|
1119 |
+
- pse
|
1120 |
+
- pss
|
1121 |
+
- ptu
|
1122 |
+
- pui
|
1123 |
+
- pus
|
1124 |
+
- pwg
|
1125 |
+
- pww
|
1126 |
+
- pxm
|
1127 |
+
- qub
|
1128 |
+
- quc-dialect_central
|
1129 |
+
- quc-dialect_east
|
1130 |
+
- quc-dialect_north
|
1131 |
+
- quf
|
1132 |
+
- quh
|
1133 |
+
- qul
|
1134 |
+
- quw
|
1135 |
+
- quy
|
1136 |
+
- quz
|
1137 |
+
- qvc
|
1138 |
+
- qve
|
1139 |
+
- qvh
|
1140 |
+
- qvm
|
1141 |
+
- qvn
|
1142 |
+
- qvo
|
1143 |
+
- qvs
|
1144 |
+
- qvw
|
1145 |
+
- qvz
|
1146 |
+
- qwh
|
1147 |
+
- qxh
|
1148 |
+
- qxl
|
1149 |
+
- qxn
|
1150 |
+
- qxo
|
1151 |
+
- qxr
|
1152 |
+
- rah
|
1153 |
+
- rai
|
1154 |
+
- rap
|
1155 |
+
- rav
|
1156 |
+
- raw
|
1157 |
+
- rej
|
1158 |
+
- rel
|
1159 |
+
- rgu
|
1160 |
+
- rhg
|
1161 |
+
- rif-script_arabic
|
1162 |
+
- rif-script_latin
|
1163 |
+
- ril
|
1164 |
+
- rim
|
1165 |
+
- rjs
|
1166 |
+
- rkt
|
1167 |
+
- rmc-script_cyrillic
|
1168 |
+
- rmc-script_latin
|
1169 |
+
- rmo
|
1170 |
+
- rmy-script_cyrillic
|
1171 |
+
- rmy-script_latin
|
1172 |
+
- rng
|
1173 |
+
- rnl
|
1174 |
+
- roh-dialect_sursilv
|
1175 |
+
- roh-dialect_vallader
|
1176 |
+
- rol
|
1177 |
+
- ron
|
1178 |
+
- rop
|
1179 |
+
- rro
|
1180 |
+
- rub
|
1181 |
+
- ruf
|
1182 |
+
- rug
|
1183 |
+
- run
|
1184 |
+
- rus
|
1185 |
+
- sab
|
1186 |
+
- sag
|
1187 |
+
- sah
|
1188 |
+
- saj
|
1189 |
+
- saq
|
1190 |
+
- sas
|
1191 |
+
- sat
|
1192 |
+
- sba
|
1193 |
+
- sbd
|
1194 |
+
- sbl
|
1195 |
+
- sbp
|
1196 |
+
- sch
|
1197 |
+
- sck
|
1198 |
+
- sda
|
1199 |
+
- sea
|
1200 |
+
- seh
|
1201 |
+
- ses
|
1202 |
+
- sey
|
1203 |
+
- sgb
|
1204 |
+
- sgj
|
1205 |
+
- sgw
|
1206 |
+
- shi
|
1207 |
+
- shk
|
1208 |
+
- shn
|
1209 |
+
- sho
|
1210 |
+
- shp
|
1211 |
+
- sid
|
1212 |
+
- sig
|
1213 |
+
- sil
|
1214 |
+
- sja
|
1215 |
+
- sjm
|
1216 |
+
- sld
|
1217 |
+
- slk
|
1218 |
+
- slu
|
1219 |
+
- slv
|
1220 |
+
- sml
|
1221 |
+
- smo
|
1222 |
+
- sna
|
1223 |
+
- snd
|
1224 |
+
- sne
|
1225 |
+
- snn
|
1226 |
+
- snp
|
1227 |
+
- snw
|
1228 |
+
- som
|
1229 |
+
- soy
|
1230 |
+
- spa
|
1231 |
+
- spp
|
1232 |
+
- spy
|
1233 |
+
- sqi
|
1234 |
+
- sri
|
1235 |
+
- srm
|
1236 |
+
- srn
|
1237 |
+
- srp-script_cyrillic
|
1238 |
+
- srp-script_latin
|
1239 |
+
- srx
|
1240 |
+
- stn
|
1241 |
+
- stp
|
1242 |
+
- suc
|
1243 |
+
- suk
|
1244 |
+
- sun
|
1245 |
+
- sur
|
1246 |
+
- sus
|
1247 |
+
- suv
|
1248 |
+
- suz
|
1249 |
+
- swe
|
1250 |
+
- swh
|
1251 |
+
- sxb
|
1252 |
+
- sxn
|
1253 |
+
- sya
|
1254 |
+
- syl
|
1255 |
+
- sza
|
1256 |
+
- tac
|
1257 |
+
- taj
|
1258 |
+
- tam
|
1259 |
+
- tao
|
1260 |
+
- tap
|
1261 |
+
- taq
|
1262 |
+
- tat
|
1263 |
+
- tav
|
1264 |
+
- tbc
|
1265 |
+
- tbg
|
1266 |
+
- tbk
|
1267 |
+
- tbl
|
1268 |
+
- tby
|
1269 |
+
- tbz
|
1270 |
+
- tca
|
1271 |
+
- tcc
|
1272 |
+
- tcs
|
1273 |
+
- tcz
|
1274 |
+
- tdj
|
1275 |
+
- ted
|
1276 |
+
- tee
|
1277 |
+
- tel
|
1278 |
+
- tem
|
1279 |
+
- teo
|
1280 |
+
- ter
|
1281 |
+
- tes
|
1282 |
+
- tew
|
1283 |
+
- tex
|
1284 |
+
- tfr
|
1285 |
+
- tgj
|
1286 |
+
- tgk
|
1287 |
+
- tgl
|
1288 |
+
- tgo
|
1289 |
+
- tgp
|
1290 |
+
- tha
|
1291 |
+
- thk
|
1292 |
+
- thl
|
1293 |
+
- tih
|
1294 |
+
- tik
|
1295 |
+
- tir
|
1296 |
+
- tkr
|
1297 |
+
- tlb
|
1298 |
+
- tlj
|
1299 |
+
- tly
|
1300 |
+
- tmc
|
1301 |
+
- tmf
|
1302 |
+
- tna
|
1303 |
+
- tng
|
1304 |
+
- tnk
|
1305 |
+
- tnn
|
1306 |
+
- tnp
|
1307 |
+
- tnr
|
1308 |
+
- tnt
|
1309 |
+
- tob
|
1310 |
+
- toc
|
1311 |
+
- toh
|
1312 |
+
- tom
|
1313 |
+
- tos
|
1314 |
+
- tpi
|
1315 |
+
- tpm
|
1316 |
+
- tpp
|
1317 |
+
- tpt
|
1318 |
+
- trc
|
1319 |
+
- tri
|
1320 |
+
- trn
|
1321 |
+
- trs
|
1322 |
+
- tso
|
1323 |
+
- tsz
|
1324 |
+
- ttc
|
1325 |
+
- tte
|
1326 |
+
- ttq-script_tifinagh
|
1327 |
+
- tue
|
1328 |
+
- tuf
|
1329 |
+
- tuk-script_arabic
|
1330 |
+
- tuk-script_latin
|
1331 |
+
- tuo
|
1332 |
+
- tur
|
1333 |
+
- tvw
|
1334 |
+
- twb
|
1335 |
+
- twe
|
1336 |
+
- twu
|
1337 |
+
- txa
|
1338 |
+
- txq
|
1339 |
+
- txu
|
1340 |
+
- tye
|
1341 |
+
- tzh-dialect_bachajon
|
1342 |
+
- tzh-dialect_tenejapa
|
1343 |
+
- tzj-dialect_eastern
|
1344 |
+
- tzj-dialect_western
|
1345 |
+
- tzo-dialect_chamula
|
1346 |
+
- tzo-dialect_chenalho
|
1347 |
+
- ubl
|
1348 |
+
- ubu
|
1349 |
+
- udm
|
1350 |
+
- udu
|
1351 |
+
- uig-script_arabic
|
1352 |
+
- uig-script_cyrillic
|
1353 |
+
- ukr
|
1354 |
+
- umb
|
1355 |
+
- unr
|
1356 |
+
- upv
|
1357 |
+
- ura
|
1358 |
+
- urb
|
1359 |
+
- urd-script_arabic
|
1360 |
+
- urd-script_devanagari
|
1361 |
+
- urd-script_latin
|
1362 |
+
- urk
|
1363 |
+
- urt
|
1364 |
+
- ury
|
1365 |
+
- usp
|
1366 |
+
- uzb-script_cyrillic
|
1367 |
+
- uzb-script_latin
|
1368 |
+
- vag
|
1369 |
+
- vid
|
1370 |
+
- vie
|
1371 |
+
- vif
|
1372 |
+
- vmw
|
1373 |
+
- vmy
|
1374 |
+
- vot
|
1375 |
+
- vun
|
1376 |
+
- vut
|
1377 |
+
- wal-script_ethiopic
|
1378 |
+
- wal-script_latin
|
1379 |
+
- wap
|
1380 |
+
- war
|
1381 |
+
- waw
|
1382 |
+
- way
|
1383 |
+
- wba
|
1384 |
+
- wlo
|
1385 |
+
- wlx
|
1386 |
+
- wmw
|
1387 |
+
- wob
|
1388 |
+
- wol
|
1389 |
+
- wsg
|
1390 |
+
- wwa
|
1391 |
+
- xal
|
1392 |
+
- xdy
|
1393 |
+
- xed
|
1394 |
+
- xer
|
1395 |
+
- xho
|
1396 |
+
- xmm
|
1397 |
+
- xnj
|
1398 |
+
- xnr
|
1399 |
+
- xog
|
1400 |
+
- xon
|
1401 |
+
- xrb
|
1402 |
+
- xsb
|
1403 |
+
- xsm
|
1404 |
+
- xsr
|
1405 |
+
- xsu
|
1406 |
+
- xta
|
1407 |
+
- xtd
|
1408 |
+
- xte
|
1409 |
+
- xtm
|
1410 |
+
- xtn
|
1411 |
+
- xua
|
1412 |
+
- xuo
|
1413 |
+
- yaa
|
1414 |
+
- yad
|
1415 |
+
- yal
|
1416 |
+
- yam
|
1417 |
+
- yao
|
1418 |
+
- yas
|
1419 |
+
- yat
|
1420 |
+
- yaz
|
1421 |
+
- yba
|
1422 |
+
- ybb
|
1423 |
+
- ycl
|
1424 |
+
- ycn
|
1425 |
+
- yea
|
1426 |
+
- yka
|
1427 |
+
- yli
|
1428 |
+
- yor
|
1429 |
+
- yre
|
1430 |
+
- yua
|
1431 |
+
- yue-script_traditional
|
1432 |
+
- yuz
|
1433 |
+
- yva
|
1434 |
+
- zaa
|
1435 |
+
- zab
|
1436 |
+
- zac
|
1437 |
+
- zad
|
1438 |
+
- zae
|
1439 |
+
- zai
|
1440 |
+
- zam
|
1441 |
+
- zao
|
1442 |
+
- zaq
|
1443 |
+
- zar
|
1444 |
+
- zas
|
1445 |
+
- zav
|
1446 |
+
- zaw
|
1447 |
+
- zca
|
1448 |
+
- zga
|
1449 |
+
- zim
|
1450 |
+
- ziw
|
1451 |
+
- zlm
|
1452 |
+
- zmz
|
1453 |
+
- zne
|
1454 |
+
- zos
|
1455 |
+
- zpc
|
1456 |
+
- zpg
|
1457 |
+
- zpi
|
1458 |
+
- zpl
|
1459 |
+
- zpm
|
1460 |
+
- zpo
|
1461 |
+
- zpt
|
1462 |
+
- zpu
|
1463 |
+
- zpz
|
1464 |
+
- ztq
|
1465 |
+
- zty
|
1466 |
+
- zul
|
1467 |
+
- zyb
|
1468 |
+
- zyp
|
1469 |
+
- zza
|
1470 |
+
|
1471 |
+
</details>
|
1472 |
+
|
1473 |
+
## Model details
|
1474 |
+
|
1475 |
+
- **Developed by:** Vineel Pratap et al.
|
1476 |
+
- **Model type:** Multi-Lingual Automatic Speech Recognition model
|
1477 |
+
- **Language(s):** 1107+ languages, see [supported languages](#supported-languages)
|
1478 |
+
- **License:** CC-BY-NC 4.0 license
|
1479 |
+
- **Num parameters**: 1 billion
|
1480 |
+
- **Cite as:**
|
1481 |
+
|
1482 |
+
@article{pratap2023mms,
|
1483 |
+
title={Scaling Speech Technology to 1,000+ Languages},
|
1484 |
+
author={Vineel Pratap and Andros Tjandra and Bowen Shi and Paden Tomasello and Arun Babu and Sayani Kundu and Ali Elkahky and Zhaoheng Ni and Apoorv Vyas and Maryam Fazel-Zarandi and Alexei Baevski and Yossi Adi and Xiaohui Zhang and Wei-Ning Hsu and Alexis Conneau and Michael Auli},
|
1485 |
+
journal={arXiv},
|
1486 |
+
year={2023}
|
1487 |
+
}
|
1488 |
+
|
1489 |
+
## Additional Links
|
1490 |
+
|
1491 |
+
- [Blog post]( )
|
1492 |
+
- [Transformers documentation](https://huggingface.co/docs/transformers/main/en/model_doc/mms).
|
1493 |
+
- [Paper](https://arxiv.org/abs/2305.13516)
|
1494 |
+
- [GitHub Repository](https://github.com/facebookresearch/fairseq/tree/main/examples/mms#asr)
|
1495 |
+
- [Other **MMS** checkpoints](https://huggingface.co/models?other=mms)
|
1496 |
+
- [Official Space](https://huggingface.co/spaces/facebook/MMS)
|