File size: 2,445 Bytes
a8a91ce
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
65aa48f
a8a91ce
 
 
 
 
 
 
c798b57
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a8a91ce
b2e33cd
 
a8a91ce
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
---
license: apache-2.0
language:
- ru
library_name: transformers
pipeline_tag: automatic-speech-recognition
tags:
- asr
- Pytorch
- pruned
- audio
- automatic-speech-recognition
---

# Whisper-tiny-ru-pruned

## Model info
This is a pruned version of [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) model with only russian tokens left.
Pruning was made without any fine-tuning. Method from [this post](https://medium.com/m/global-identity-2?redirectUrl=https%3A%2F%2Ftowardsdatascience.com%2Fhow-to-adapt-a-multilingual-t5-model-for-a-single-language-b9f94f3d9c90) was used.

## Size
Only 10% tokens was left including special whisper tokens, added whisper tokens, 100 most popular tokens from tokenizer and 3000 most popular Russian tokens computed by tokenization of russian text corpus.

Model size is 50%  less then original whisper-tiny:
|  | openai/whisper-tiny | waveletdeboshir/whisper-tiny-ru-pruned |
| :------ | :------ | :------ |
| n of parameters | 38 M | 19.6 M |
| n of parameters (with proj_out layer) | 57.6 M | 21.5 M |
| model file size | 151 Mb | 86 Mb |
| vocab_size | 51865 | 4705 |

## Usage
Model can be used as an original whisper:

```python
>>> from transformers import WhisperProcessor, WhisperForConditionalGeneration
>>> import torchaudio

>>> # load audio
>>> wav, sr = torchaudio.load("audio.wav")

>>> # load model and processor
>>> processor = WhisperProcessor.from_pretrained("waveletdeboshir/whisper-tiny-ru-pruned")
>>> model = WhisperForConditionalGeneration.from_pretrained("waveletdeboshir/whisper-tiny-ru-pruned")

>>> input_features = processor(wav[0], sampling_rate=sr, return_tensors="pt").input_features 

>>> # generate token ids
>>> predicted_ids = model.generate(input_features)
>>> # decode token ids to text
>>> transcription = processor.batch_decode(predicted_ids, skip_special_tokens=False)
['<|startoftranscript|><|ru|><|transcribe|><|notimestamps|> Начинаем работу.<|endoftext|>']

```
The context tokens can be removed from the start of the transcription by setting `skip_special_tokens=True`.

## Other pruned whisper models
* [waveletdeboshir/whisper-base-ru-pruned](https://huggingface.co/waveletdeboshir/whisper-base-ru-pruned)
* [waveletdeboshir/whisper-small-ru-pruned](https://huggingface.co/waveletdeboshir/whisper-small-ru-pruned)

## Metrics
TODO

You can fine-tune this model on your data to achive better performance.

## Colab for pruning
TODO