mms
mms-cclms / README.md
vineelpratap's picture
Update README.md
92a7c8a
|
raw
history blame
3.13 kB
metadata
license: cc-by-nc-4.0
tags:
  - mms

Massively Multilingual Speech (MMS) - Common Crawl Language Models

This repository consists of the n-gram language models trained on Common Crawl data (Conneau et al. 2020b, NLLB_Team et al. 2022) using KenLM library.

Table Of Content

Example


TODO

Supported Languages

We support language models in 102 languages. Unclick the following to toogle all supported languages of this checkpoint in ISO 639-3 code. You can find more details about the languages and their ISO 639-3 codes in the MMS Language Coverage Overview.

Click to toggle
  • afr
  • amh
  • ara
  • asm
  • ast
  • azj
  • bel
  • ben
  • bos
  • bul
  • cat
  • ceb
  • ces
  • ckb
  • cmn
  • cym
  • dan
  • deu
  • ell
  • eng
  • est
  • fas
  • fin
  • fra
  • ful
  • gle
  • glg
  • guj
  • hau
  • heb
  • hin
  • hrv
  • hun
  • hye
  • ibo
  • ind
  • isl
  • ita
  • jav
  • jpn
  • kam
  • kan
  • kat
  • kaz
  • kea
  • khm
  • kir
  • kor
  • lao
  • lav
  • lin
  • lit
  • ltz
  • lug
  • luo
  • mal
  • mar
  • mkd
  • mlt
  • mon
  • mri
  • mya
  • nld
  • nob
  • npi
  • nso
  • nya
  • oci
  • orm
  • ory
  • pan
  • pol
  • por
  • pus
  • ron
  • rus
  • slk
  • slv
  • sna
  • snd
  • som
  • spa
  • srp
  • swe
  • swh
  • tam
  • tel
  • tgk
  • tgl
  • tha
  • tur
  • ukr
  • umb
  • urd
  • uzb
  • vie
  • wol
  • xho
  • yor
  • yue
  • zlm
  • zul

Model details

  • Developed by: Vineel Pratap et al.

  • Model type: Multi-Lingual Automatic Speech Recognition model

  • Language(s): 126 languages, see supported languages

  • License: CC-BY-NC 4.0 license

  • Num parameters: 1 billion

  • Audio sampling rate: 16,000 kHz

  • Cite as:

    @article{pratap2023mms,
      title={Scaling Speech Technology to 1,000+ Languages},
      author={Vineel Pratap and Andros Tjandra and Bowen Shi and Paden Tomasello and Arun Babu and Sayani Kundu and Ali Elkahky and Zhaoheng Ni and Apoorv Vyas and Maryam Fazel-Zarandi and Alexei Baevski and Yossi Adi and Xiaohui Zhang and Wei-Ning Hsu and Alexis Conneau and Michael Auli},
    journal={arXiv},
    year={2023}
    }
    

Additional Links