CTC-Baseline MMS-based ASR model - set 1

This repository contains a CTC-Baseline MMS-based automatic speech recognition (ASR) model trained with ESPnet.
The model was trained on balanced training data from set 1.

Intended Use

This model is intended for ASR. Users can run inference using the provided checkpoint (valid.loss.best.pth) and configuration file (config.yaml):

import soundfile as sf
from espnet2.bin.asr_inference import Speech2Text

asr_train_config = "ctc-baseline_mms_set_1/config.yaml"
asr_model_file = "ctc-baseline_mms_set_1/valid.loss.best.pth"

model = Speech2Text.from_pretrained(
    asr_train_config=asr_train_config,
    asr_model_file=asr_model_file
)

speech, _ = sf.read("input.wav")
text, *_ = model(speech)[0]

print("Recognized text:", text)

How to Use

  1. Clone this repository.
  2. Use ESPnet’s inference scripts with the provided config.yaml and checkpoint file.
  3. Ensure any external resources referenced in config.yaml are available at the indicated relative paths.
Downloads last month
11
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.