Sylber

This is official implementation of Sylber: Syllabic Embedding Representation of Speech from Raw Audio.

Sylber is the first of its kind that yields extremely short tokens from raw audio (on average, 4.27 tokens/sec) through dynamic tokenization at the syllable granularity.

The model is developed and trained by Berkeley Speech Group.

Installation

The model can be installed through pypi for inference.

pip install sylber

Usage


from sylber import Segmenter

# Loading Sylber
segmenter = Segmenter(model_ckpt="sylber")


# Run Sylber
wav_file = "samples/sample.wav"

outputs = segmenter(wav_file, in_second=True) # in_second can be False to output segments in frame numbers.

# outputs = {"segments": numpy array of [start, end] of segment,
#            "segment_features": numpy array of segment-averaged features,
#            "hidden_states": numpy array of raw features used for segmentation.

Training

Please check https://github.com/Berkeley-Speech-Group/sylber for training the model.

cheoljun95
/

sylber

Sylber

Installation

Usage

Training

license: apache-2.0