--- license: cc-by-nc-4.0 --- # ECAPA2 Speaker Embedding Extractor Link to paper: [ECAPA2: A Hybrid Neural Network Architecture and Training Strategy for Robust Speaker Embeddings](https://arxiv.org/abs/2401.08342). ECAPA2 is a hybrid neural network architecture and training strategy for generating robust speaker embeddings. The provided pre-trained model has an easy-to-use API to extract speaker embeddings and other hierarchical features. More information can be found in our original ECAPA2 paper.
## Usage Guide ### Download model You need to install the `huggingface_hub` package to download the ECAPA2 model: ```bash pip install --upgrade huggingface_hub ``` Or with Conda: ```bash conda install -c conda-forge huggingface_hub ``` Download model: ```python from huggingface_hub import hf_hub_download # automatically checks for cached file, optionally set `cache_dir` location model_file = hf_hub_download(repo_id='Jenthe/ECAPA2', filename='ecapa2.pt', cache_dir=None) ``` ### Speaker Embedding Extraction Extracting speaker embeddings is easy and only requires a few lines of code: ```python import torch import torchaudio ecapa2 = torch.jit.load(model_file, map_location='cpu') audio, sr = torchaudio.load('sample.wav') # sample rate of 16 kHz expected embedding = ecapa2(audio) ``` For faster, 16-bit half-precision CUDA inference (recommended): ```python import torch import torchaudio ecapa2 = torch.jit.load(model_file, map_location='cuda') ecapa2.half() # optional, but results in faster inference audio, sr = torchaudio.load('sample.wav') # sample rate of 16 kHz expected embedding = ecapa2(audio) ``` The initial calls to the JIT-model can in some cases take a very long time because of optimization attempts of the compiler. If you have issues, the JIT-optimizer can be disabled as following: ```python with torch.jit.optimized_execution(False): embedding = ecapa2(audio) ``` There is no need for `ecapa2.eval()` or `torch.no_grad()`, this is done automatically. ## Citation **BibTeX:** ``` @INPROCEEDINGS{ecapa2, author={Jenthe Thienpondt and Kris Demuynck}, booktitle={2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)}, title={ECAPA2: A Hybrid Neural Network Architecture and Training Strategy for Robust Speaker Embeddings}, year={2023}, volume={}, number={} } ``` **APA:** ``` Jenthe Thienpondt, Kris Demuynck (2023). ECAPA2: A Hybrid Neural Network Architecture and Training Strategy for Robust Speaker Embeddings. In 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) ``` ## Contact Name: Jenthe Thienpondt\ E-mail: jenthe.thienpondt@ugent.be