YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Sylber

This is official implementation of Sylber: Syllabic Embedding Representation of Speech from Raw Audio.

Sylber is the first of its kind that yields extremely short tokens from raw audio (on average, 4.27 tokens/sec) through dynamic tokenization at the syllable granularity.

The model is developed and trained by Berkeley Speech Group.

Installation

The model can be installed through pypi for inference.

pip install sylber

Usage


from sylber import Segmenter

# Loading Sylber
segmenter = Segmenter(model_ckpt="sylber")


# Run Sylber
wav_file = "samples/sample.wav"

outputs = segmenter(wav_file, in_second=True) # in_second can be False to output segments in frame numbers.

# outputs = {"segments": numpy array of [start, end] of segment,
#            "segment_features": numpy array of segment-averaged features,
#            "hidden_states": numpy array of raw features used for segmentation.

Training

Please check https://github.com/Berkeley-Speech-Group/sylber for training the model.


license: apache-2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.