lIlBrother
commited on
Commit
•
63356a6
1
Parent(s):
4363fe8
Update: readme 수정
Browse files
README.md
CHANGED
@@ -1,3 +1,86 @@
|
|
1 |
---
|
2 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
language:
|
3 |
+
- ko # Example: fr
|
4 |
+
license: apache-2.0 # Example: apache-2.0 or any license from https://hf.co/docs/hub/repositories-licenses
|
5 |
+
library_name: transformers # Optional. Example: keras or any library from https://github.com/huggingface/hub-docs/blob/main/js/src/lib/interfaces/Libraries.ts
|
6 |
+
tags:
|
7 |
+
- audio
|
8 |
+
- automatic-speech-recognition
|
9 |
+
datasets:
|
10 |
+
- KsponSpeech
|
11 |
+
metrics:
|
12 |
+
- wer # Example: wer. Use metric id from https://hf.co/metrics
|
13 |
---
|
14 |
+
|
15 |
+
# ko-spelling-wav2vec2-conformer-del-1s
|
16 |
+
|
17 |
+
## Table of Contents
|
18 |
+
- [ko-spelling-wav2vec2-conformer-del-1s](#ko-spelling-wav2vec2-conformer-del-1s)
|
19 |
+
- [Table of Contents](#table-of-contents)
|
20 |
+
- [Model Details](#model-details)
|
21 |
+
- [Evaluation](#evaluation)
|
22 |
+
- [How to Get Started With the Model](#how-to-get-started-with-the-model)
|
23 |
+
|
24 |
+
## Model Details
|
25 |
+
- **Model Description:**
|
26 |
+
해당 모델은 wav2vec2-conformer base architecture에 scratch pre-training 되었습니다. <br />
|
27 |
+
Wav2Vec2ConformerForCTC를 이용하여 KsponSpeech에 대한 Fine-Tuning 모델입니다. <br />
|
28 |
+
|
29 |
+
- Dataset use [AIHub KsponSpeech](https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=123) <br />
|
30 |
+
Datasets는 해당 Data를 전처리하여 임의로 만들어 사용하였습니다. <br />
|
31 |
+
해당 모델은 **철자전사** 기준의 데이터로 학습된 모델입니다. (숫자와 영어는 각 표기법을 따름) <br />
|
32 |
+
|
33 |
+
- **Developed by:** TADev (@lIlBrother, @ddobokki, @jp42maru)
|
34 |
+
- **Language(s):** Korean
|
35 |
+
- **License:** apache-2.0
|
36 |
+
- **Parent Model:** See the [wav2vec2-conformer](https://huggingface.co/docs/transformers/model_doc/wav2vec2-conformer) for more information about the pre-trained base model. (해당 모델은 wav2vec2-conformer base architecture에 scratch pre-training 되었습니다.)
|
37 |
+
|
38 |
+
## Evaluation
|
39 |
+
Just using `load_metric("wer")` and `load_metric("wer")` in huggingface `datasets` library <br />
|
40 |
+
|
41 |
+
## How to Get Started With the Model
|
42 |
+
```python
|
43 |
+
from transformers import (
|
44 |
+
AutoConfig,
|
45 |
+
AutoFeatureExtractor,
|
46 |
+
AutoModelForCTC,
|
47 |
+
AutoTokenizer,
|
48 |
+
Wav2Vec2ProcessorWithLM,
|
49 |
+
)
|
50 |
+
from transformers.pipelines import AutomaticSpeechRecognitionPipeline
|
51 |
+
import librosa
|
52 |
+
|
53 |
+
# 모델과 토크나이저, 예측을 위한 각 모듈들을 불러옵니다.
|
54 |
+
config = AutoConfig.from_pretrained(model_config_path)
|
55 |
+
model = AutoModelForCTC.from_pretrained(
|
56 |
+
model_name_or_path,
|
57 |
+
config=config,
|
58 |
+
)
|
59 |
+
feature_extractor = AutoFeatureExtractor.from_pretrained(model_name_or_path)
|
60 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
|
61 |
+
beamsearch_decoder = build_ctcdecoder(
|
62 |
+
labels=list(tokenizer.encoder.keys()),
|
63 |
+
kenlm_model_path=None,
|
64 |
+
)
|
65 |
+
processor = Wav2Vec2ProcessorWithLM(
|
66 |
+
feature_extractor=feature_extractor, tokenizer=tokenizer, decoder=beamsearch_decoder
|
67 |
+
)
|
68 |
+
|
69 |
+
# 실제 예측을 위한 파이프라인에 정의된 모듈들을 삽입.
|
70 |
+
asr_pipeline = AutomaticSpeechRecognitionPipeline(
|
71 |
+
model=model,
|
72 |
+
tokenizer=processor.tokenizer,
|
73 |
+
feature_extractor=processor.feature_extractor,
|
74 |
+
decoder=processor.decoder,
|
75 |
+
device=-1,
|
76 |
+
)
|
77 |
+
|
78 |
+
# 음성파일을 불러오고 beamsearch 파라미터를 특정하여 예측을 수행합니다.
|
79 |
+
raw_data, _ = librosa.load(audio_path, sr=16000)
|
80 |
+
kwargs = {"decoder_kwargs": {"beam_width": 100}}
|
81 |
+
pred = asr_pipeline(inputs=raw_data, **kwargs)["text"]
|
82 |
+
# 모델이 자소 분리 유니코드 텍스트로 나오므로, 일반 String으로 변환해줄 필요가 있습니다.
|
83 |
+
result = unicodedata.normalize("NFC", pred)
|
84 |
+
print(result)
|
85 |
+
# 안녕하세요 123 테스트입니다.
|
86 |
+
```
|