Conformer based spoken language identification model

Summary

This is a conformer-based streaming language identification model with attentive temporal pooling.

The model was trained with public data only.

The paper: https://arxiv.org/abs/2202.12163

@inproceedings{wang2022attentive,
  title={Attentive Temporal Pooling for Conformer-based Streaming Language Identification in Long-form Speech},
  author={Quan Wang and Yang Yu and Jason Pelecanos and Yiling Huang and Ignacio Lopez Moreno},
  booktitle={Odyssey: The Speaker and Language Recognition Workshop},
  year={2022}
}

Usage

Run use this model, you will need to use the siglingvo library: https://github.com/google/speaker-id/tree/master/lingvo

Since lingvo does not support Python 3.11 yet, make sure your Python is up to 3.10.

Install the library:

pip install sidlingvo

Example usage:

import os
from sidlingvo import wav_to_lang
from huggingface_hub import hf_hub_download

repo_id = "tflite-hub/conformer-lang-id"
model_path = "models"
hf_hub_download(repo_id=repo_id, filename="vad_short_model.tflite", local_dir=model_path)
hf_hub_download(repo_id=repo_id, filename="vad_short_mean_stddev.csv", local_dir=model_path)
hf_hub_download(repo_id=repo_id, filename="conformer_langid_medium.tflite", local_dir=model_path)

wav_file = "your_wav_file.wav"
runner = wav_to_lang.WavToLangRunner(
    vad_model_file=os.path.join(model_path, "vad_short_model.tflite"),
    vad_mean_stddev_file=os.path.join(model_path, "vad_short_mean_stddev.csv"),
    langid_model_file=os.path.join(model_path, "conformer_langid_medium.tflite"))
top_lang, _ = runner.wav_to_lang(wav_file)
print("Predicted language:", top_lang)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Space using tflite-hub/conformer-lang-id 1