Model Card: Spectra-0 (anti-spoofing / bonafide vs spoof)

Spectra-0 is a model for speech spoofing detection (binary classification: bonafide vs spoof) from raw audio waveforms. Architecture: SSL encoder (Wav2Vec2) → MLP projection → ECAPA-TDNN 2-class classifier.

Input: waveform (float32), shape (batch, num_samples) (typically 16 kHz).
Output: logits of shape (batch, 2), where index 0 = spoof, index 1 = bonafide.

On first run, the model will automatically download the SSL encoder facebook/wav2vec2-xls-r-300m via transformers.

Evaluation Results

Model	ASVspoof19 LA	ASVspoof21 LA	ASVspoof21 DF	ASVspoof5	ADD2022	In-the-Wild
Res2TCNGuard	7.487	19.130	19.883	37.620	49.538	49.246
AASIST3	27.585	37.407	33.099	41.001	47.192	39.626
XSLS	0.231	7.714	4.220	17.688	33.951	7.453
TCM-ADD	0.152	6.655	3.444	19.505	35.252	7.767
DF Arena 1B	43.793	40.137	42.994	35.333	42.139	17.598
Spectra-0	0.181	6.475	5.410	14.426	14.716	1.026

Quickstart

Clone from Hugging Face

This repository is hosted on Hugging Face Hub: https://huggingface.co/MTUCI/spectra_0.

git lfs install
git clone https://huggingface.co/MTUCI/spectra_0
cd spectra_0

Install dependencies

pip install -U torch torchaudio transformers huggingface_hub safetensors soundfile

Single-file inference (example preprocessing)

import random
import torch
import torchaudio
import soundfile as sf

from model import spectra_0


def pad_random(x: torch.Tensor, max_len: int = 64600) -> torch.Tensor:
    # x: (num_samples,) or (1, num_samples)
    if x.ndim > 1:
        x = x.squeeze()
    x_len = x.shape[0]
    if x_len >= max_len:
        start = random.randint(0, x_len - max_len)
        return x[start:start + max_len]
    num_repeats = int(max_len / x_len) + 1
    return x.repeat(num_repeats)[:max_len]


def load_audio_mono(path: str) -> torch.Tensor:
    audio, sr = sf.read(path, dtype="float32")
    audio = torch.from_numpy(audio)
    if audio.ndim > 1:
        # (num_samples, channels) -> mono
        audio = audio.mean(dim=1)
    if sr != 16000:
        audio = torchaudio.functional.resample(audio, sr, 16000)
    return audio


device = "cuda" if torch.cuda.is_available() else "cpu"
model = spectra_0.from_pretrained(pretrained_model_name_or_path=".").eval().to(device)

audio = load_audio_mono("path/to/audio.wav")
audio = torchaudio.functional.preemphasis(audio.unsqueeze(0))  # (1, T)
audio = pad_random(audio.squeeze(0), 64600).unsqueeze(0)       # (1, 64600)

with torch.inference_mode():
    logits = model(audio.to(device))  # (1, 2)
    score_spoof = logits[0, 0].item()
    score_bonafide = logits[0, 1].item()

print({"score_bonafide": score_bonafide, "score_spoof": score_spoof})

Threshold-based classification (and how to tune it)

In model.py, the Spectra0Model class provides classify() with a default threshold chosen as an “optimal” value for the original setting:

Default threshold: -1.0625009 (it thresholds logit_bonafide = logits[:, 1])
Note: this threshold may not be optimal on a different dataset/domain. It’s recommended to tune the threshold on your dataset using EER (Equal Error Rate) or a target FAR/FRR.

Example:

with torch.inference_mode():
    pred = model.classify(audio.to(device), threshold=-1.0625009)  # 1=bonafide, 0=spoof

Tuning the threshold via EER (typical workflow)

Run the model on a labeled set and collect scores for both classes.
Compute EER and the threshold

Limitations and notes

This is a pre-release model.
Significantly stronger models are planned for Q3–Q4 2026 — stay tuned.

License

MIT (see the license field in the model repo header).

Contacts

TG channel: https://t.me/korallll_ai email: [email protected] website: https://lab260.ru/

Downloads last month: -; Downloads are not tracked for this model. How to track

Evaluation results

Equal Error Rate on ASVspoof19_LA
self-reported

0.181
Equal Error Rate on ASVspoof21_LA
self-reported

6.475
Equal Error Rate on ASVspoof21_DF
self-reported

5.410
Equal Error Rate on ASVspoof5
self-reported

14.426
Equal Error Rate on ADD2022
self-reported

14.716
Equal Error Rate on In-the-Wild
self-reported

1.026