End-to-end Neural Diarization with Encoder-Decoder Based Attractors trained on AMI-headset. This example could be found at egs2/ami/diar1.

Configurations:

  • Use ESPNet's default frontend to extract features. The sampling rate is 8000 Hz, with a frame length of 25 ms and a frame shift of 10 ms. The frontend extracts 23 log-scaled Mel-filterbanks.
  • Use 4 layer stacked Transformer encoder, each outputs 256-dimensional frame-wise embeddings.
  • Use the ESPNet' standard rnn attractor (LSTM) with hidden size of 256.
  • Initial training uses data with 4 speakers for 500 epochs, following spk4/diar_train_diar_eda_raw_spk4/config.yaml.
  • Adaptation involves fine-tuning the model using data with 3 and 5 speakers respectively for 20 epochs respectively, using spk3/diar_train_diar_eda_raw_spk3/config.yaml and spk5/diar_train_diar_eda_raw_spk5/config.yaml respectively.

RESULTS

The following results were obtained using the checkpoint spk5/diar_train_diar_eda_raw_spk5/20epoch.pth, tested on the test and development sets with the 4-speakers.

Environments

  • date: Thu Dec 19 22:43:37 EST 2024
  • python version: 3.11.10 (main, Oct 3 2024, 07:29:13) [GCC 11.2.0]
  • espnet version: espnet 202409
  • pytorch version: pytorch 2.4.0
  • Git hash: c12b3d59ca4fd8847edf274e56a1716474d2a30e
    • Commit date: Thu Dec 19 21:58:26 2024 -0500

spk4

DER

diarized_test

threshold_median_collar DER
result_th0.3_med11_collar0.0 72.44
result_th0.3_med1_collar0.0 74.64
result_th0.4_med11_collar0.0 70.60
result_th0.4_med1_collar0.0 72.30
result_th0.5_med11_collar0.0 70.45
result_th0.5_med1_collar0.0 72.02
result_th0.6_med11_collar0.0 71.85
result_th0.6_med1_collar0.0 73.41
result_th0.7_med11_collar0.0 75.56
result_th0.7_med1_collar0.0 77.02

spk4

DER

diarized_dev

threshold_median_collar DER
result_th0.3_med11_collar0.0 74.37
result_th0.3_med1_collar0.0 75.96
result_th0.4_med11_collar0.0 71.69
result_th0.4_med1_collar0.0 72.94
result_th0.5_med11_collar0.0 70.83
result_th0.5_med1_collar0.0 72.12
result_th0.6_med11_collar0.0 71.96
result_th0.6_med1_collar0.0 73.34
result_th0.7_med11_collar0.0 75.81
result_th0.7_med1_collar0.0 76.99
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.