facebook
/

mms-cclms

mms

Model card Files Files and versions

xet

Community

vineelpratap commited on Jun 13, 2023

Commit

92a7c8a

1 Parent(s): a0e83b7

Update README.md

Browse files

Files changed (1) hide show

README.md +160 -0

README.md CHANGED Viewed

@@ -1,3 +1,163 @@
 ---
 license: cc-by-nc-4.0
 ---

 ---
 license: cc-by-nc-4.0
+tags:
+- mms
 ---
+# Massively Multilingual Speech (MMS) - Common Crawl Language Models
+This repository consists of the n-gram language models trained on Common Crawl data ([Conneau et al. 2020b](https://aclanthology.org/2020.acl-main.747/), [NLLB_Team et al. 2022](https://arxiv.org/abs/2207.04672)) using [KenLM library](https://github.com/kpu/kenlm).
+## Table Of Content
+- [Example](#example)
+- [Supported Languages](#supported-languages)
+- [Model details](#model-details)
+- [Additional links](#additional-links)
+## Example
+```py
+TODO
+```
+## Supported Languages
+We support language models in 102 languages. Unclick the following to toogle all supported languages of this checkpoint in [ISO 639-3 code](https://en.wikipedia.org/wiki/ISO_639-3).
+You can find more details about the languages and their ISO 639-3 codes in the [MMS Language Coverage Overview](https://dl.fbaipublicfiles.com/mms/misc/language_coverage_mms.html).
+<details>
+  <summary>Click to toggle</summary>
+ - afr
+ - amh
+ - ara
+ - asm
+ - ast
+ - azj
+ - bel
+ - ben
+ - bos
+ - bul
+ - cat
+ - ceb
+ - ces
+ - ckb
+ - cmn
+ - cym
+ - dan
+ - deu
+ - ell
+ - eng
+ - est
+ - fas
+ - fin
+ - fra
+ - ful
+ - gle
+ - glg
+ - guj
+ - hau
+ - heb
+ - hin
+ - hrv
+ - hun
+ - hye
+ - ibo
+ - ind
+ - isl
+ - ita
+ - jav
+ - jpn
+ - kam
+ - kan
+ - kat
+ - kaz
+ - kea
+ - khm
+ - kir
+ - kor
+ - lao
+ - lav
+ - lin
+ - lit
+ - ltz
+ - lug
+ - luo
+ - mal
+ - mar
+ - mkd
+ - mlt
+ - mon
+ - mri
+ - mya
+ - nld
+ - nob
+ - npi
+ - nso
+ - nya
+ - oci
+ - orm
+ - ory
+ - pan
+ - pol
+ - por
+ - pus
+ - ron
+ - rus
+ - slk
+ - slv
+ - sna
+ - snd
+ - som
+ - spa
+ - srp
+ - swe
+ - swh
+ - tam
+ - tel
+ - tgk
+ - tgl
+ - tha
+ - tur
+ - ukr
+ - umb
+ - urd
+ - uzb
+ - vie
+ - wol
+ - xho
+ - yor
+ - yue
+ - zlm
+ - zul
+</details>
+## Model details
+- **Developed by:** Vineel Pratap et al.
+- **Model type:** Multi-Lingual Automatic Speech Recognition model
+- **Language(s):** 126 languages, see [supported languages](#supported-languages)
+- **License:** CC-BY-NC 4.0 license
+- **Num parameters**: 1 billion
+- **Audio sampling rate**: 16,000 kHz
+- **Cite as:**
+      @article{pratap2023mms,
+        title={Scaling Speech Technology to 1,000+ Languages},
+        author={Vineel Pratap and Andros Tjandra and Bowen Shi and Paden Tomasello and Arun Babu and Sayani Kundu and Ali Elkahky and Zhaoheng Ni and Apoorv Vyas and Maryam Fazel-Zarandi and Alexei Baevski and Yossi Adi and Xiaohui Zhang and Wei-Ning Hsu and Alexis Conneau and Michael Auli},
+      journal={arXiv},
+      year={2023}
+      }
+## Additional Links
+- [Blog post](https://ai.facebook.com/blog/multilingual-model-speech-recognition/)
+- [Transformers documentation](https://huggingface.co/docs/transformers/main/en/model_doc/mms).
+- [Paper](https://arxiv.org/abs/2305.13516)
+- [GitHub Repository](https://github.com/facebookresearch/fairseq/tree/main/examples/mms#asr)
+- [Other **MMS** checkpoints](https://huggingface.co/models?other=mms)
+- MMS base checkpoints:
+  - [facebook/mms-1b](https://huggingface.co/facebook/mms-1b)
+  - [facebook/mms-300m](https://huggingface.co/facebook/mms-300m)
+- [Official Space](https://huggingface.co/spaces/facebook/MMS)