update model

Browse files

Files changed (7) hide show

README.md +69 -0
pytorch_model.bin +1 -1
runs/Feb02_22-48-04_job-7083fbbc-ffb8-4f9b-8706-99212ecf5dd3/1643843696.0612628/events.out.tfevents.1643843696.job-7083fbbc-ffb8-4f9b-8706-99212ecf5dd3.34573.1 +3 -0
runs/Feb02_22-48-04_job-7083fbbc-ffb8-4f9b-8706-99212ecf5dd3/events.out.tfevents.1643843696.job-7083fbbc-ffb8-4f9b-8706-99212ecf5dd3.34573.0 +3 -0
special_tokens_map.json +1 -1
tokenizer_config.json +1 -1
training_args.bin +1 -1

README.md ADDED Viewed

	@@ -0,0 +1,69 @@

+---
+language:
+- ca
+license: apache-2.0
+tags:
+- automatic-speech-recognition
+- mozilla-foundation/common_voice_8_0
+- collectivat/tv3_parla
+- projecte-aina/parlament_parla
+- generated_from_trainer
+- robust-speech-event
+datasets:
+- mozilla-foundation/common_voice_8_0
+- collectivat/tv3_parla
+- projecte-aina/parlament_parla
+model-index:
+- name: wav2vec2-xls-r-1b-ca
+  results:
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# wav2vec2-xls-r-1b-ca
+This model is a fine-tuned version of [facebook/wav2vec2-xls-r-1b](https://huggingface.co/facebook/wav2vec2-xls-r-1b) on the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - CA dataset.
+## Model description
+Please check the original [facebook/wav2vec2-xls-r-1b](https://huggingface.co/facebook/wav2vec2-xls-r-1b) Model card. This is just a finetuned version of that model.
+## Intended uses & limitations
+As any model trained on crowdsourced data, this model can show the biases and particularities of the data and model used to train this model. Moreover, since this is a speech recognition model, it may underperform for some lower-resourced dialects for the catalan language.
+## Training and evaluation data
+## Training procedure
+The data is preprocessed to remove characters not on the catalan alphabet. Moreover, numbers are verbalized using code provided by [@ccoreilly](https://github.com/ccoreilly), which can be found on the text/ folder or [here](https://github.com/CollectivaT-dev/catotron-cpu/blob/master/text/numbers_ca.py).
+### Training results
+Check the Tensorboard tab to check the training profile and evaluation results along training. The model was evaluated on the test splits for each of the datasets used during training.
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 2e-05
+- train_batch_size: 8
+- eval_batch_size: 8
+- seed: 42
+- gradient_accumulation_steps: 8
+- total_train_batch_size: 64
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- lr_scheduler_warmup_steps: 2000
+- num_epochs: 10.0
+- mixed_precision_training: Native AMP
+### Framework versions
+- Transformers 4.17.0.dev0
+- Pytorch 1.10.2+cu102
+- Datasets 1.18.3
+- Tokenizers 0.11.0
+# Thanks
+Want to thank both [@ccoreilly](https://github.com/ccoreilly) and [@gullabi](https://github.com/gullabi) who have contributed with their own resources and knowledge into making this model possible.

pytorch_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:29986e5b3283319d48ecefdd68fd3bb0fe2061d61358907401c9fdc74053f315
 size 3850543281

 version https://git-lfs.github.com/spec/v1
+oid sha256:460a907cccb967dcaf1e86c147c373256527b490cd81f16e4118691f11540bc1
 size 3850543281

runs/Feb02_22-48-04_job-7083fbbc-ffb8-4f9b-8706-99212ecf5dd3/1643843696.0612628/events.out.tfevents.1643843696.job-7083fbbc-ffb8-4f9b-8706-99212ecf5dd3.34573.1 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:247990135b2ab8e66109696533617ffee09d3b576aa5b09a97e1e755efea4cb9
+size 4808

runs/Feb02_22-48-04_job-7083fbbc-ffb8-4f9b-8706-99212ecf5dd3/events.out.tfevents.1643843696.job-7083fbbc-ffb8-4f9b-8706-99212ecf5dd3.34573.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:be991b3e225de2c0a64b9d2532338398f4e6e63f66a343c0d1e0744f105a9ecd
+size 22862

special_tokens_map.json CHANGED Viewed

@@ -1 +1 @@

- {"bos_token": "<s>", "eos_token": "</s>", "unk_token": "[UNK]", "pad_token": "[PAD]", "additional_special_tokens": [{"content": "<s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, {"content": "</s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}~~, {"content": "<s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, {"content": "</s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}~~]}


1	+ {"bos_token": "<s>", "eos_token": "</s>", "unk_token": "[UNK]", "pad_token": "[PAD]", "additional_special_tokens": [{"content": "<s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, {"content": "</s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}]}

tokenizer_config.json CHANGED Viewed

	@@ -1 +1 @@
1	- {"unk_token": "[UNK]", "bos_token": "<s>", "eos_token": "</s>", "pad_token": "[PAD]", "do_lower_case": false, "word_delimiter_token": "\|", "special_tokens_map_file": null, "tokenizer_file": null, "name_or_path": "~~PereLluis13/~~wav2vec2-xls-r-1b-ca", "tokenizer_class": "Wav2Vec2CTCTokenizer", "processor_class": "Wav2Vec2ProcessorWithLM"}


1	+ {"unk_token": "[UNK]", "bos_token": "<s>", "eos_token": "</s>", "pad_token": "[PAD]", "do_lower_case": false, "word_delimiter_token": "\|", "special_tokens_map_file": null, "tokenizer_file": null, "name_or_path": "wav2vec2-xls-r-1b-ca", "tokenizer_class": "Wav2Vec2CTCTokenizer", "processor_class": "Wav2Vec2ProcessorWithLM"}

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0b4495658d9c79564d9db2345008309646fb4bbe6823c2bc2d99589d4387f305
 size 3055

 version https://git-lfs.github.com/spec/v1
+oid sha256:bf5dc7623df5813d5142da994775815df2fb6df73e2e3bae384d6f76b6bfdc81
 size 3055