TylorShine commited on
Commit
2abacad
1 Parent(s): 80cc0ff

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -0
README.md CHANGED
@@ -1,3 +1,42 @@
1
  ---
 
 
 
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: ja
3
+ tags:
4
+ - speech
5
  license: apache-2.0
6
  ---
7
+
8
+ # distilhubert-ft-japanese-50k
9
+
10
+ Fine-tuned (more precisely, continue trained) model on Japanese using the [JVS corpus](https://sites.google.com/site/shinnosuketakamichi/research-topics/jvs_corpus), [Tsukuyomi-Chan corpus](https://tyc.rei-yumesaki.net/material/corpus/), [Amitaro's ITA corpus V2.1](https://amitaro.net/), and recorded my own read [ITA corpus](https://github.com/mmorise/ita-corpus).
11
+
12
+
13
+ Original repos, Many thanks!:
14
+ [S3PRL](https://github.com/s3prl/s3prl/tree/main/s3prl/pretrain)
15
+ - Using this when training (with little modify for train using own datasets).
16
+ [distilhubert (hf)](https://huggingface.co/ntu-spml/distilhubert)
17
+
18
+
19
+ Note: As same as the original, this model does not have a tokenizer as it was pretrained on audio alone. In order to use this model speech recognition, a tokenizer should be created and the model should be fine-tuned on labeled text data. Check out [this blog](https://huggingface.co/blog/fine-tune-wav2vec2-english) for more in-detail explanation of how to fine-tune the model.
20
+
21
+ # Usage
22
+
23
+ See [this blog](https://huggingface.co/blog/fine-tune-wav2vec2-english) for more information on how to fine-tune the model. Note that the class `Wav2Vec2ForCTC` has to be replaced by `HubertForCTC`.
24
+
25
+ Note: This is not the best checkpoint and become more accurate with continued train, I think. I'll try to continue when I have a time.
26
+
27
+ ## Credits
28
+ - [JVS corpus](https://sites.google.com/site/shinnosuketakamichi/research-topics/jvs_corpus)
29
+
30
+ - [Tsukuyomi-Chan corpus](https://tyc.rei-yumesaki.net/material/corpus/)
31
+ ```
32
+ ■つくよみちゃんコーパス(CV.夢前黎)
33
+ https://tyc.rei-yumesaki.net/material/corpus/
34
+ ```
35
+
36
+ - [Amitaro's ITA corpus](https://amitaro.net/)
37
+ ```
38
+ あみたろの声素材工房
39
+ ```
40
+ [https://amitaro.net/](https://amitaro.net/)
41
+
42
+ Thanks!