--- license: apache-2.0 datasets: - mozilla-foundation/common_voice_10_0 base_model: - facebook/wav2vec2-xls-r-300m tags: - pytorch - phoneme-recognition pipeline_tag: automatic-speech-recognition --- Model Information ================= Allophant is a multilingual phoneme recognizer trained on spoken sentences in 34 languages, capable of generalizing zero-shot to unseen phoneme inventories. The model is based on [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) and was pre-trained on a subset of the [Common Voice Corpus 10.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_10_0) transcribed with [eSpeak NG](https://github.com/espeak-ng/espeak-ng). | Model Name | UCLA Phonetic Corpus (PER) | UCLA Phonetic Corpus (AER) | Common Voice (PER) | Common Voice (AER) | | ---------------- | ---------: | ---------: | -------: | -------: | | [Multitask](https://huggingface.co/kgnlp/allophant) | **45.62%** | 19.44% | **34.34%** | **8.36%** | | [Hierarchical](https://huggingface.co/kgnlp/allophant-hierarchical) | 46.09% | **19.18%** | 34.35% | 8.56% | | [Multitask Shared](https://huggingface.co/kgnlp/allophant-shared) | 46.05% | 19.52% | 41.20% | 8.88% | | **Baseline Shared** | 48.25% | - | 45.35% | - | | [Baseline](https://huggingface.co/kgnlp/allophant-baseline) | 57.01% | - | 46.95% | - | Note that our baseline models were trained without phonetic feature classifiers and therefore only support phoneme recognition. Citation ======== ```bibtex @inproceedings{glocker2023allophant, title={Allophant: Cross-lingual Phoneme Recognition with Articulatory Attributes}, author={Glocker, Kevin and Herygers, Aaricia and Georges, Munir}, year={2023}, booktitle={{Proc. Interspeech 2023}}, month={8}} ```