Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,61 @@
|
|
1 |
-
---
|
2 |
-
license:
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- th
|
5 |
+
- en
|
6 |
+
base_model:
|
7 |
+
- openai/whisper-medium
|
8 |
+
pipeline_tag: automatic-speech-recognition
|
9 |
+
library_name: transformers
|
10 |
+
metrics:
|
11 |
+
- wer
|
12 |
+
---
|
13 |
+
|
14 |
+
# Pathumma Whisper Medium (Th)
|
15 |
+
|
16 |
+
## Model Description
|
17 |
+
Additional information is needed
|
18 |
+
|
19 |
+
## Quickstart
|
20 |
+
You can transcribe audio files using the [`pipeline`](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline) class with the following code snippet:
|
21 |
+
```python
|
22 |
+
import torch
|
23 |
+
from transformers import pipeline
|
24 |
+
|
25 |
+
device = "cuda" if torch.cuda.is_available() else "cpu"
|
26 |
+
torch_dtype = torch.bfloat16 if torch.cuda.is_available() else torch.float32
|
27 |
+
|
28 |
+
lang = "th"
|
29 |
+
task = "transcribe"
|
30 |
+
|
31 |
+
pipe = pipeline(
|
32 |
+
task="automatic-speech-recognition",
|
33 |
+
model="nectec/Pathumma-whisper-th-medium",
|
34 |
+
torch_dtype=torch_dtype,
|
35 |
+
device=device,
|
36 |
+
)
|
37 |
+
pipe.model.config.forced_decoder_ids = pipe.tokenizer.get_decoder_prompt_ids(language=lang, task=task)
|
38 |
+
|
39 |
+
text = pipe("audio_path.wav")["text"]
|
40 |
+
print(text)
|
41 |
+
```
|
42 |
+
|
43 |
+
## Limitations and Future Work
|
44 |
+
Additional information is needed
|
45 |
+
|
46 |
+
## Acknowledgements
|
47 |
+
We extend our appreciation to the research teams engaged in the creation of the open speech model, including AIResearch, BiodatLab, Looloo Technology, SCB 10X, and OpenAI. We would like to express our gratitude to Dr. Titipat Achakulwisut of BiodatLab for the evaluation pipeline. We express our gratitude to ThaiSC, or NSTDA Supercomputer Centre, for supplying the LANTA used for model training, fine-tuning, and evaluation.
|
48 |
+
|
49 |
+
## Pathumma Audio Team
|
50 |
+
*Pattara Tipaksorn*, Wayupuk Sommuang, Oatsada Chatthong, *Kwanchiva Thangthai*
|
51 |
+
|
52 |
+
## Citation
|
53 |
+
```
|
54 |
+
@misc{tipaksorn2024PathummaWhisper,
|
55 |
+
title = { {Pathumma Whisper Medium (TH)} },
|
56 |
+
author = { Pattara Tipaksorn and Wayupuk Sommuang and Oatsada Chatthong and Kwanchiva Thangthai },
|
57 |
+
url = { https://huggingface.co/nectec/Pathumma-whisper-th-medium },
|
58 |
+
publisher = { Hugging Face },
|
59 |
+
year = { 2024 },
|
60 |
+
}
|
61 |
+
```
|