Model Description

Model utilizes Wav2vec2 architecture trained on the Superb dataset for keyword spotting task and was fine tuned to identify dental dental click utterance (https://en.wikipedia.org/wiki/Dental_click) in speech. Model was trained for 10 epochs on a limited quantity of speech (~1.5 hours) and with only one speaker. Thus the model should not be assumed to hold generalizability to other speakers or languages without further training data or rigorous testing.

Model was evaluated for accuracy on a hold out test set of 20% of the available data and scored 97%.

Uses

Model can be used via transformers library or via Hugging Face Hosted inference API to the right. I would caution against the use of the 'Record from browser' option as model may erronously identify user's mouse click as a speech utterance. Audio files for upload should be 1 sec in length, with 'WAV' format and 16 bit signed integer PCM encoding.

Downloads last month
4
Safetensors
Model size
94.6M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train JBJoyce/DENTAL_CLICK_classifier