|
--- |
|
license: cc-by-4.0 |
|
language: |
|
- qu |
|
metrics: |
|
- cer |
|
- wer |
|
pipeline_tag: automatic-speech-recognition |
|
--- |
|
|
|
## Usage |
|
|
|
The model can be used directly (without a language model) as follows: |
|
|
|
```python |
|
from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC |
|
import torch |
|
import torchaudio |
|
|
|
# load model and processor |
|
processor = Wav2Vec2Processor.from_pretrained("ivangtorre/wav2vec2-xls-r-300m-quechua") |
|
model = Wav2Vec2ForCTC.from_pretrained("ivangtorre/wav2vec2-xls-r-300m-quechua") |
|
|
|
# load dummy dataset and read soundfiles |
|
file = torchaudio.load("quechua000573.wav") |
|
|
|
# retrieve logits |
|
logits = model(file[0]).logits |
|
|
|
# take argmax and decode |
|
predicted_ids = torch.argmax(logits, dim=-1) |
|
transcription = processor.batch_decode(predicted_ids) |
|
print("HF prediction: ", transcription) |
|
``` |