ypluit commited on
Commit
df70866
·
1 Parent(s): f2523c1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +89 -0
README.md CHANGED
@@ -1,3 +1,92 @@
1
  ---
 
 
2
  license: cc-by-4.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - kr
4
  license: cc-by-4.0
5
+ library_name: nemo
6
+ datasets:
7
+ - RealCallData
8
+ thumbnail: null
9
+ tags:
10
+ - automatic-speech-recognition
11
+ - speech
12
+ - audio
13
+ - Citrinet1024
14
+ - NeMo
15
+ - pytorch
16
+ model-index:
17
+ - name: stt_kr_citrinet1024_PublicCallCenter_1000H_0.22
18
+ results: []
19
  ---
20
+
21
+ ## Model Overview
22
+
23
+ <DESCRIBE IN ONE LINE THE MODEL AND ITS USE>
24
+
25
+ ## NVIDIA NeMo: Training
26
+
27
+ To train, fine-tune or play with the model you will need to install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed latest Pytorch version.
28
+ ```
29
+ pip install nemo_toolkit['all']
30
+ ```
31
+
32
+ ## How to Use this Model
33
+
34
+ The model is available for use in the NeMo toolkit [1], and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset.
35
+
36
+
37
+ ### Automatically instantiate the model
38
+
39
+ ```python
40
+ import nemo.collections.asr as nemo_asr
41
+ asr_model = nemo_asr.models.ASRModel.from_pretrained("ypluit/stt_kr_citrinet1024_PublicCallCenter_1000H_0.22")
42
+ ```
43
+
44
+
45
+ ### Transcribing using Python
46
+ First, let's get a sample
47
+ ```
48
+ get any korean telephone voice wave file
49
+ ```
50
+ Then simply do:
51
+ ```
52
+ asr_model.transcribe(['sample-kr.wav'])
53
+ ```
54
+
55
+ ### Transcribing many audio files
56
+
57
+ ```shell
58
+ python [NEMO_GIT_FOLDER]/examples/asr/transcribe_speech.py pretrained_name="model" audio_dir="<DIRECTORY CONTAINING AUDIO FILES>"
59
+ ```
60
+
61
+ ### Input
62
+
63
+ This model accepts 16000Hz Mono-channel Audio (wav files) as input.
64
+
65
+ ### Output
66
+
67
+ This model provides transcribed speech as a string for a given audio sample.
68
+
69
+
70
+ ## Model Architecture
71
+
72
+ See nemo toolkit and reference papers.
73
+ ## Training
74
+
75
+ Learned about 30 days on 2 A6000
76
+
77
+ ### Datasets
78
+
79
+ Private call center real data (1100hour)
80
+
81
+ ## Performance
82
+
83
+ < 0.13 CER
84
+
85
+ ## Limitations
86
+
87
+ This model was trained with 650 hours of Korean telephone voice data for customer service in a call center. might be Poor performance for general-purpose dialogue and specific accents.
88
+
89
+ ## References
90
+
91
+
92
+ [1] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)