added evaluation metric on cv8 test dataset

Browse files

Files changed (11) hide show

.gitignore +0 -1
.ipynb_checkpoints/README-checkpoint.md +122 -21
.ipynb_checkpoints/run-checkpoint.sh +1 -2
.ipynb_checkpoints/run_eval-checkpoint.sh +2 -4
README.md +17 -4
evaluation_no_lm/.ipynb_checkpoints/mozilla-foundation_common_voice_8_0_zh-HK_test_eval_results-checkpoint.txt +2 -0
evaluation_no_lm/log_mozilla-foundation_common_voice_8_0_zh-HK_test_predictions.txt +0 -0
evaluation_no_lm/log_mozilla-foundation_common_voice_8_0_zh-HK_test_targets.txt +0 -0
evaluation_no_lm/mozilla-foundation_common_voice_8_0_zh-HK_test_eval_results.txt +2 -0
run.sh +1 -2
run_eval.sh +2 -4

.gitignore CHANGED Viewed

	@@ -1,2 +1 @@
1	checkpoint-*/
2	- log_*


1	checkpoint-*/

.ipynb_checkpoints/README-checkpoint.md CHANGED Viewed

@@ -6,13 +6,24 @@ tags:
 - automatic-speech-recognition
 - mozilla-foundation/common_voice_8_0
 - generated_from_trainer
-- zh-HK
-- robust-speech-event
 datasets:
-- common_voice
-model-index:
-- name: ''
-  results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -22,8 +33,8 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - ZH-HK dataset.
 It achieves the following results on the evaluation set:
-- Loss: 2.6726
-- Wer: 0.9815
 ## Model description
@@ -51,23 +62,113 @@ The following hyperparameters were used during training:
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_steps: 500
-- num_epochs: 10.0
 - mixed_precision_training: Native AMP
 ### Training results
-| Training Loss | Epoch | Step | Validation Loss | Wer    |
-|:-------------:|:-----:|:----:|:---------------:|:------:|
-| No log        | 1.0   | 183  | 47.8442         | 1.0    |
-| No log        | 2.0   | 366  | 6.3109          | 1.0    |
-| 41.8902       | 3.0   | 549  | 6.2392          | 1.0    |
-| 41.8902       | 4.0   | 732  | 5.9739          | 1.1123 |
-| 41.8902       | 5.0   | 915  | 4.9014          | 1.9474 |
-| 5.5817        | 6.0   | 1098 | 3.9892          | 1.0188 |
-| 5.5817        | 7.0   | 1281 | 3.5080          | 1.0104 |
-| 5.5817        | 8.0   | 1464 | 3.0797          | 0.9905 |
-| 3.5579        | 9.0   | 1647 | 2.8111          | 0.9836 |
-| 3.5579        | 10.0  | 1830 | 2.6726          | 0.9815 |
 ### Framework versions

 - automatic-speech-recognition
 - mozilla-foundation/common_voice_8_0
 - generated_from_trainer
 datasets:
+- mozilla-foundation/common_voice_8_0
+- name: XLS-R-300M - Chinese_HongKong (Cantonese)
+  results:
+  - task:
+      name: Automatic Speech Recognition
+      type: automatic-speech-recognition
+    dataset:
+      name: Common Voice 8
+      type: mozilla-foundation/common_voice_7_0
+      args: zh-HK
+    metrics:
+       - name: Test WER
+         type: wer
+         value: 0.8111349803079126
+       - name: Test CER
+         type: cer
+         value: 0.21962250882996914
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - ZH-HK dataset.
 It achieves the following results on the evaluation set:
+- Loss: 1.4848
+- Wer: 0.8004
 ## Model description
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_steps: 500
+- num_epochs: 100.0
 - mixed_precision_training: Native AMP
 ### Training results
+| Training Loss | Epoch | Step  | Validation Loss | Wer    |
+|:-------------:|:-----:|:-----:|:---------------:|:------:|
+| No log        | 1.0   | 183   | 47.8442         | 1.0    |
+| No log        | 2.0   | 366   | 6.3109          | 1.0    |
+| 41.8902       | 3.0   | 549   | 6.2392          | 1.0    |
+| 41.8902       | 4.0   | 732   | 5.9739          | 1.1123 |
+| 41.8902       | 5.0   | 915   | 4.9014          | 1.9474 |
+| 5.5817        | 6.0   | 1098  | 3.9892          | 1.0188 |
+| 5.5817        | 7.0   | 1281  | 3.5080          | 1.0104 |
+| 5.5817        | 8.0   | 1464  | 3.0797          | 0.9905 |
+| 3.5579        | 9.0   | 1647  | 2.8111          | 0.9836 |
+| 3.5579        | 10.0  | 1830  | 2.6726          | 0.9815 |
+| 2.7771        | 11.0  | 2013  | 2.7177          | 0.9809 |
+| 2.7771        | 12.0  | 2196  | 2.3582          | 0.9692 |
+| 2.7771        | 13.0  | 2379  | 2.1708          | 0.9757 |
+| 2.3488        | 14.0  | 2562  | 2.0491          | 0.9526 |
+| 2.3488        | 15.0  | 2745  | 1.8518          | 0.9378 |
+| 2.3488        | 16.0  | 2928  | 1.6845          | 0.9286 |
+| 1.7859        | 17.0  | 3111  | 1.6412          | 0.9280 |
+| 1.7859        | 18.0  | 3294  | 1.5488          | 0.9035 |
+| 1.7859        | 19.0  | 3477  | 1.4546          | 0.9010 |
+| 1.3898        | 20.0  | 3660  | 1.5147          | 0.9201 |
+| 1.3898        | 21.0  | 3843  | 1.4467          | 0.8959 |
+| 1.1291        | 22.0  | 4026  | 1.4743          | 0.9035 |
+| 1.1291        | 23.0  | 4209  | 1.3827          | 0.8762 |
+| 1.1291        | 24.0  | 4392  | 1.3437          | 0.8792 |
+| 0.8993        | 25.0  | 4575  | 1.2895          | 0.8577 |
+| 0.8993        | 26.0  | 4758  | 1.2928          | 0.8558 |
+| 0.8993        | 27.0  | 4941  | 1.2947          | 0.9163 |
+| 0.6298        | 28.0  | 5124  | 1.3151          | 0.8738 |
+| 0.6298        | 29.0  | 5307  | 1.2972          | 0.8514 |
+| 0.6298        | 30.0  | 5490  | 1.3030          | 0.8432 |
+| 0.4757        | 31.0  | 5673  | 1.3264          | 0.8364 |
+| 0.4757        | 32.0  | 5856  | 1.3131          | 0.8421 |
+| 0.3735        | 33.0  | 6039  | 1.3457          | 0.8588 |
+| 0.3735        | 34.0  | 6222  | 1.3450          | 0.8473 |
+| 0.3735        | 35.0  | 6405  | 1.3452          | 0.9218 |
+| 0.3253        | 36.0  | 6588  | 1.3754          | 0.8397 |
+| 0.3253        | 37.0  | 6771  | 1.3554          | 0.8353 |
+| 0.3253        | 38.0  | 6954  | 1.3532          | 0.8312 |
+| 0.2816        | 39.0  | 7137  | 1.3694          | 0.8345 |
+| 0.2816        | 40.0  | 7320  | 1.3953          | 0.8296 |
+| 0.2397        | 41.0  | 7503  | 1.3858          | 0.8293 |
+| 0.2397        | 42.0  | 7686  | 1.3959          | 0.8402 |
+| 0.2397        | 43.0  | 7869  | 1.4350          | 0.9318 |
+| 0.2084        | 44.0  | 8052  | 1.4004          | 0.8806 |
+| 0.2084        | 45.0  | 8235  | 1.3871          | 0.8255 |
+| 0.2084        | 46.0  | 8418  | 1.4060          | 0.8252 |
+| 0.1853        | 47.0  | 8601  | 1.3992          | 0.8501 |
+| 0.1853        | 48.0  | 8784  | 1.4186          | 0.8252 |
+| 0.1853        | 49.0  | 8967  | 1.4120          | 0.8165 |
+| 0.1671        | 50.0  | 9150  | 1.4166          | 0.8214 |
+| 0.1671        | 51.0  | 9333  | 1.4411          | 0.8501 |
+| 0.1513        | 52.0  | 9516  | 1.4692          | 0.8394 |
+| 0.1513        | 53.0  | 9699  | 1.4640          | 0.8391 |
+| 0.1513        | 54.0  | 9882  | 1.4501          | 0.8419 |
+| 0.133         | 55.0  | 10065 | 1.4134          | 0.8351 |
+| 0.133         | 56.0  | 10248 | 1.4593          | 0.8405 |
+| 0.133         | 57.0  | 10431 | 1.4560          | 0.8389 |
+| 0.1198        | 58.0  | 10614 | 1.4734          | 0.8334 |
+| 0.1198        | 59.0  | 10797 | 1.4649          | 0.8318 |
+| 0.1198        | 60.0  | 10980 | 1.4659          | 0.8100 |
+| 0.1109        | 61.0  | 11163 | 1.4784          | 0.8119 |
+| 0.1109        | 62.0  | 11346 | 1.4938          | 0.8149 |
+| 0.1063        | 63.0  | 11529 | 1.5050          | 0.8152 |
+| 0.1063        | 64.0  | 11712 | 1.4773          | 0.8176 |
+| 0.1063        | 65.0  | 11895 | 1.4836          | 0.8261 |
+| 0.0966        | 66.0  | 12078 | 1.4979          | 0.8157 |
+| 0.0966        | 67.0  | 12261 | 1.4603          | 0.8048 |
+| 0.0966        | 68.0  | 12444 | 1.4803          | 0.8127 |
+| 0.0867        | 69.0  | 12627 | 1.4974          | 0.8130 |
+| 0.0867        | 70.0  | 12810 | 1.4721          | 0.8078 |
+| 0.0867        | 71.0  | 12993 | 1.4644          | 0.8192 |
+| 0.0827        | 72.0  | 13176 | 1.4835          | 0.8138 |
+| 0.0827        | 73.0  | 13359 | 1.4934          | 0.8122 |
+| 0.0734        | 74.0  | 13542 | 1.4951          | 0.8062 |
+| 0.0734        | 75.0  | 13725 | 1.4908          | 0.8070 |
+| 0.0734        | 76.0  | 13908 | 1.4876          | 0.8124 |
+| 0.0664        | 77.0  | 14091 | 1.4934          | 0.8053 |
+| 0.0664        | 78.0  | 14274 | 1.4603          | 0.8048 |
+| 0.0664        | 79.0  | 14457 | 1.4732          | 0.8073 |
+| 0.0602        | 80.0  | 14640 | 1.4925          | 0.8078 |
+| 0.0602        | 81.0  | 14823 | 1.4812          | 0.8064 |
+| 0.057         | 82.0  | 15006 | 1.4950          | 0.8013 |
+| 0.057         | 83.0  | 15189 | 1.4785          | 0.8056 |
+| 0.057         | 84.0  | 15372 | 1.4856          | 0.7993 |
+| 0.0517        | 85.0  | 15555 | 1.4755          | 0.8034 |
+| 0.0517        | 86.0  | 15738 | 1.4813          | 0.8034 |
+| 0.0517        | 87.0  | 15921 | 1.4966          | 0.8048 |
+| 0.0468        | 88.0  | 16104 | 1.4883          | 0.8002 |
+| 0.0468        | 89.0  | 16287 | 1.4746          | 0.8023 |
+| 0.0468        | 90.0  | 16470 | 1.4697          | 0.7974 |
+| 0.0426        | 91.0  | 16653 | 1.4775          | 0.8004 |
+| 0.0426        | 92.0  | 16836 | 1.4852          | 0.8023 |
+| 0.0387        | 93.0  | 17019 | 1.4868          | 0.8004 |
+| 0.0387        | 94.0  | 17202 | 1.4785          | 0.8021 |
+| 0.0387        | 95.0  | 17385 | 1.4892          | 0.8015 |
+| 0.0359        | 96.0  | 17568 | 1.4862          | 0.8018 |
+| 0.0359        | 97.0  | 17751 | 1.4851          | 0.8007 |
+| 0.0359        | 98.0  | 17934 | 1.4846          | 0.7999 |
+| 0.0347        | 99.0  | 18117 | 1.4852          | 0.7993 |
+| 0.0347        | 100.0 | 18300 | 1.4848          | 0.8004 |
 ### Framework versions

.ipynb_checkpoints/run-checkpoint.sh CHANGED Viewed

@@ -24,5 +24,4 @@ python run_speech_recognition_ctc.py \
     --do_eval \
 	--max_duration_in_seconds="6" \
 	--evaluation_strategy='epoch' \
-	--save_strategy='epoch' \
-    --ignore_data_skip \

     --do_eval \
 	--max_duration_in_seconds="6" \
 	--evaluation_strategy='epoch' \
+	--save_strategy='epoch'

.ipynb_checkpoints/run_eval-checkpoint.sh CHANGED Viewed

@@ -1,8 +1,6 @@
 python eval.py \
 --model_id="ivanlau/wav2vec2-large-xls-r-300m-cantonese" \
---dataset="speech-recognition-community-v2/dev_data" \
 --config="zh-HK" \
---split="validation" \
---chunk_length_s="5.0" \
---stride_length_s="1.0" \
 --log_outputs \

 python eval.py \
 --model_id="ivanlau/wav2vec2-large-xls-r-300m-cantonese" \
+--dataset="mozilla-foundation/common_voice_8_0" \
 --config="zh-HK" \
+--split="test" \
 --log_outputs \

README.md CHANGED Viewed

@@ -7,10 +7,23 @@ tags:
 - mozilla-foundation/common_voice_8_0
 - generated_from_trainer
 datasets:
-- common_voice
-model-index:
-- name: ''
-  results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You

 - mozilla-foundation/common_voice_8_0
 - generated_from_trainer
 datasets:
+- mozilla-foundation/common_voice_8_0
+- name: XLS-R-300M - Chinese_HongKong (Cantonese)
+  results:
+  - task:
+      name: Automatic Speech Recognition
+      type: automatic-speech-recognition
+    dataset:
+      name: Common Voice 8
+      type: mozilla-foundation/common_voice_7_0
+      args: zh-HK
+    metrics:
+       - name: Test WER
+         type: wer
+         value: 0.8111349803079126
+       - name: Test CER
+         type: cer
+         value: 0.21962250882996914
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You

evaluation_no_lm/.ipynb_checkpoints/mozilla-foundation_common_voice_8_0_zh-HK_test_eval_results-checkpoint.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ WER: 0.8111349803079126
2	+ CER: 0.21962250882996914

evaluation_no_lm/log_mozilla-foundation_common_voice_8_0_zh-HK_test_predictions.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

evaluation_no_lm/log_mozilla-foundation_common_voice_8_0_zh-HK_test_targets.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

evaluation_no_lm/mozilla-foundation_common_voice_8_0_zh-HK_test_eval_results.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ WER: 0.8111349803079126
2	+ CER: 0.21962250882996914

run.sh CHANGED Viewed

@@ -24,5 +24,4 @@ python run_speech_recognition_ctc.py \
     --do_eval \
 	--max_duration_in_seconds="6" \
 	--evaluation_strategy='epoch' \
-	--save_strategy='epoch' \
-    --ignore_data_skip \

     --do_eval \
 	--max_duration_in_seconds="6" \
 	--evaluation_strategy='epoch' \
+	--save_strategy='epoch'

run_eval.sh CHANGED Viewed

@@ -1,8 +1,6 @@
 python eval.py \
 --model_id="ivanlau/wav2vec2-large-xls-r-300m-cantonese" \
---dataset="speech-recognition-community-v2/dev_data" \
 --config="zh-HK" \
---split="validation" \
---chunk_length_s="5.0" \
---stride_length_s="1.0" \
 --log_outputs \

 python eval.py \
 --model_id="ivanlau/wav2vec2-large-xls-r-300m-cantonese" \
+--dataset="mozilla-foundation/common_voice_8_0" \
 --config="zh-HK" \
+--split="test" \
 --log_outputs \