lIlBrother commited on
Commit
63356a6
1 Parent(s): 4363fe8

Update: readme 수정

Browse files
Files changed (1) hide show
  1. README.md +84 -1
README.md CHANGED
@@ -1,3 +1,86 @@
1
  ---
2
- license: apache-2.0
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - ko # Example: fr
4
+ license: apache-2.0 # Example: apache-2.0 or any license from https://hf.co/docs/hub/repositories-licenses
5
+ library_name: transformers # Optional. Example: keras or any library from https://github.com/huggingface/hub-docs/blob/main/js/src/lib/interfaces/Libraries.ts
6
+ tags:
7
+ - audio
8
+ - automatic-speech-recognition
9
+ datasets:
10
+ - KsponSpeech
11
+ metrics:
12
+ - wer # Example: wer. Use metric id from https://hf.co/metrics
13
  ---
14
+
15
+ # ko-spelling-wav2vec2-conformer-del-1s
16
+
17
+ ## Table of Contents
18
+ - [ko-spelling-wav2vec2-conformer-del-1s](#ko-spelling-wav2vec2-conformer-del-1s)
19
+ - [Table of Contents](#table-of-contents)
20
+ - [Model Details](#model-details)
21
+ - [Evaluation](#evaluation)
22
+ - [How to Get Started With the Model](#how-to-get-started-with-the-model)
23
+
24
+ ## Model Details
25
+ - **Model Description:**
26
+ 해당 모델은 wav2vec2-conformer base architecture에 scratch pre-training 되었습니다. <br />
27
+ Wav2Vec2ConformerForCTC를 이용하여 KsponSpeech에 대한 Fine-Tuning 모델입니다. <br />
28
+
29
+ - Dataset use [AIHub KsponSpeech](https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=123) <br />
30
+ Datasets는 해당 Data를 전처리하여 임의로 만들어 사용하였습니다. <br />
31
+ 해당 모델은 **철자전사** 기준의 데이터로 학습된 모델입니다. (숫자와 영어는 각 표기법을 따름) <br />
32
+
33
+ - **Developed by:** TADev (@lIlBrother, @ddobokki, @jp42maru)
34
+ - **Language(s):** Korean
35
+ - **License:** apache-2.0
36
+ - **Parent Model:** See the [wav2vec2-conformer](https://huggingface.co/docs/transformers/model_doc/wav2vec2-conformer) for more information about the pre-trained base model. (해당 모델은 wav2vec2-conformer base architecture에 scratch pre-training 되었습니다.)
37
+
38
+ ## Evaluation
39
+ Just using `load_metric("wer")` and `load_metric("wer")` in huggingface `datasets` library <br />
40
+
41
+ ## How to Get Started With the Model
42
+ ```python
43
+ from transformers import (
44
+ AutoConfig,
45
+ AutoFeatureExtractor,
46
+ AutoModelForCTC,
47
+ AutoTokenizer,
48
+ Wav2Vec2ProcessorWithLM,
49
+ )
50
+ from transformers.pipelines import AutomaticSpeechRecognitionPipeline
51
+ import librosa
52
+
53
+ # 모델과 토크나이저, 예측을 위한 각 모듈들을 불러옵니다.
54
+ config = AutoConfig.from_pretrained(model_config_path)
55
+ model = AutoModelForCTC.from_pretrained(
56
+ model_name_or_path,
57
+ config=config,
58
+ )
59
+ feature_extractor = AutoFeatureExtractor.from_pretrained(model_name_or_path)
60
+ tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
61
+ beamsearch_decoder = build_ctcdecoder(
62
+ labels=list(tokenizer.encoder.keys()),
63
+ kenlm_model_path=None,
64
+ )
65
+ processor = Wav2Vec2ProcessorWithLM(
66
+ feature_extractor=feature_extractor, tokenizer=tokenizer, decoder=beamsearch_decoder
67
+ )
68
+
69
+ # 실제 예측을 위한 파이프라인에 정의된 모듈들을 삽입.
70
+ asr_pipeline = AutomaticSpeechRecognitionPipeline(
71
+ model=model,
72
+ tokenizer=processor.tokenizer,
73
+ feature_extractor=processor.feature_extractor,
74
+ decoder=processor.decoder,
75
+ device=-1,
76
+ )
77
+
78
+ # 음성파일을 불러오고 beamsearch 파라미터를 특정하여 예측을 수행합니다.
79
+ raw_data, _ = librosa.load(audio_path, sr=16000)
80
+ kwargs = {"decoder_kwargs": {"beam_width": 100}}
81
+ pred = asr_pipeline(inputs=raw_data, **kwargs)["text"]
82
+ # 모델이 자소 분리 유니코드 텍스트로 나오므로, 일반 String으로 변환해줄 필요가 있습니다.
83
+ result = unicodedata.normalize("NFC", pred)
84
+ print(result)
85
+ # 안녕하세요 123 테스트입니다.
86
+ ```