BigVGAN with different mel spectrogram input

These BigVGAN checkpoints are from continued training of https://huggingface.co/nvidia/bigvgan_v2_24khz_100band_256x, with the input mel spectrogram generated from this code from [vocos]:

class MelSpectrogramFeatures(FeatureExtractor):
    def __init__(self, sample_rate=24000, n_fft=1024, hop_length=256, n_mels=100, padding="center"):
        super().__init__()
        if padding not in ["center", "same"]:
            raise ValueError("Padding must be 'center' or 'same'.")
        self.padding = padding
        self.mel_spec = torchaudio.transforms.MelSpectrogram(
            sample_rate=sample_rate,
            n_fft=n_fft,
            hop_length=hop_length,
            n_mels=n_mels,
            center=padding == "center",
            power=1,
        )

    def forward(self, audio, **kwargs):
        if self.padding == "same":
            pad = self.mel_spec.win_length - self.mel_spec.hop_length
            audio = torch.nn.functional.pad(audio, (pad // 2, pad // 2), mode="reflect")
        mel = self.mel_spec(audio)
        features = safe_log(mel)
        return features

Training was done with segment_size=65536 (unchanged) and batch_size=24 (vs 32 from the Nvidia team). Final eval PESQ is 4.340 (vs 4.362 from the Nvidia checkpoint, on their own mel spectrogram code).

Downloads last month
23
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support