metadata

language:
  - he
tags:
  - automatic-speech-recognition
  - robust-speech-event
  - he
  - generated_from_trainer
model-index:
  - name: wav2vec2-xls-r-300m-hebrew
    results: []

wav2vec2-xls-r-300m-hebrew

This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the private datasets in 2 stages - firstly was fine-tuned on a small dataset with good samples Then the obtained model was fine-tuned on a large dataset with the small good dataset, with various samples from different sources, and with an unlabeled dataset that was weakly labeled using a previously trained model.

Small dataset:

split	size(gb)	n_samples	duration(hrs)
train	4.19	20306	28
dev	1.05	5076	7

Large dataset:

split	size(gb)	n_samples	duration(hrs)
train	12.3	90777	69
dev	1.05	20246	14*
(*weakly labeled data wasn't used in validation set)

After firts training it achieves:

on small dataset

Loss: 0.5438
WER: 0.1773

on large dataset

WER: 0.3811

after second training: on small dataset

WER: 0.1697

on large dataset

Loss: 0.4502
WER: 0.2318

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

First training

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 2
gradient_accumulation_steps: 4
total_train_batch_size: 64
total_eval_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 1000
num_epochs: 100.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
No log	3.15	1000	0.5203	0.4333
1.4284	6.31	2000	0.4816	0.3951
1.4284	9.46	3000	0.4315	0.3546
1.283	12.62	4000	0.4278	0.3404
1.283	15.77	5000	0.4090	0.3054
1.1777	18.93	6000	0.3893	0.3006
1.1777	22.08	7000	0.3968	0.2857
1.0994	25.24	8000	0.3892	0.2751
1.0994	28.39	9000	0.4061	0.2690
1.0323	31.54	10000	0.4114	0.2507
1.0323	34.7	11000	0.4021	0.2508
0.9623	37.85	12000	0.4032	0.2378
0.9623	41.01	13000	0.4148	0.2374
0.9077	44.16	14000	0.4350	0.2323
0.9077	47.32	15000	0.4515	0.2246
0.8573	50.47	16000	0.4474	0.2180
0.8573	53.63	17000	0.4649	0.2171
0.8083	56.78	18000	0.4455	0.2102
0.8083	59.94	19000	0.4587	0.2092
0.769	63.09	20000	0.4794	0.2012
0.769	66.25	21000	0.4845	0.2007
0.7308	69.4	22000	0.4937	0.2008
0.7308	72.55	23000	0.4920	0.1895
0.6927	75.71	24000	0.5179	0.1911
0.6927	78.86	25000	0.5202	0.1877
0.6622	82.02	26000	0.5266	0.1840
0.6622	85.17	27000	0.5351	0.1854
0.6315	88.33	28000	0.5373	0.1811
0.6315	91.48	29000	0.5331	0.1792
0.6075	94.64	30000	0.5390	0.1779
0.6075	97.79	31000	0.5459	0.1773

Second training

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 2
gradient_accumulation_steps: 4
total_train_batch_size: 64
total_eval_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 1000
num_epochs: 60.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
No log	0.7	1000	0.5371	0.3811
1.3606	1.41	2000	0.5247	0.3902
1.3606	2.12	3000	0.5126	0.3859
1.3671	2.82	4000	0.5062	0.3828
1.3671	3.53	5000	0.4979	0.3672
1.3421	4.23	6000	0.4906	0.3816
1.3421	4.94	7000	0.4784	0.3651
1.328	5.64	8000	0.4810	0.3669
1.328	6.35	9000	0.4747	0.3597
1.3109	7.05	10000	0.4813	0.3808
1.3109	7.76	11000	0.4631	0.3561
1.2873	8.46	12000	0.4603	0.3431
1.2873	9.17	13000	0.4579	0.3533
1.2661	9.87	14000	0.4471	0.3365
1.2661	10.58	15000	0.4584	0.3437
1.249	11.28	16000	0.4461	0.3454
1.249	11.99	17000	0.4482	0.3367
1.2322	12.69	18000	0.4464	0.3335
1.2322	13.4	19000	0.4427	0.3454
1.22	14.1	20000	0.4440	0.3395
1.22	14.81	21000	0.4459	0.3378
1.2044	15.51	22000	0.4406	0.3199
1.2044	16.22	23000	0.4398	0.3155
1.1913	16.92	24000	0.4237	0.3150
1.1913	17.63	25000	0.4287	0.3279
1.1705	18.34	26000	0.4253	0.3103
1.1705	19.04	27000	0.4234	0.3098
1.1564	19.75	28000	0.4174	0.3076
1.1564	20.45	29000	0.4260	0.3160
1.1461	21.16	30000	0.4235	0.3036
1.1461	21.86	31000	0.4309	0.3055
1.1285	22.57	32000	0.4264	0.3006
1.1285	23.27	33000	0.4201	0.2880
1.1135	23.98	34000	0.4131	0.2975
1.1135	24.68	35000	0.4202	0.2849
1.0968	25.39	36000	0.4105	0.2888
1.0968	26.09	37000	0.4210	0.2834
1.087	26.8	38000	0.4123	0.2843
1.087	27.5	39000	0.4216	0.2803
1.0707	28.21	40000	0.4161	0.2787
1.0707	28.91	41000	0.4186	0.2740
1.0575	29.62	42000	0.4118	0.2845
1.0575	30.32	43000	0.4243	0.2773
1.0474	31.03	44000	0.4221	0.2707
1.0474	31.73	45000	0.4138	0.2700
1.0333	32.44	46000	0.4102	0.2638
1.0333	33.15	47000	0.4162	0.2650
1.0191	33.85	48000	0.4155	0.2636
1.0191	34.56	49000	0.4129	0.2656
1.0087	35.26	50000	0.4157	0.2632
1.0087	35.97	51000	0.4090	0.2654
0.9901	36.67	52000	0.4183	0.2587
0.9901	37.38	53000	0.4251	0.2648
0.9795	38.08	54000	0.4229	0.2555
0.9795	38.79	55000	0.4176	0.2546
0.9644	39.49	56000	0.4223	0.2513
0.9644	40.2	57000	0.4244	0.2530
0.9534	40.9	58000	0.4175	0.2538
0.9534	41.61	59000	0.4213	0.2505
0.9397	42.31	60000	0.4275	0.2565
0.9397	43.02	61000	0.4315	0.2528
0.9269	43.72	62000	0.4316	0.2501
0.9269	44.43	63000	0.4247	0.2471
0.9175	45.13	64000	0.4376	0.2469
0.9175	45.84	65000	0.4335	0.2450
0.9026	46.54	66000	0.4336	0.2452
0.9026	47.25	67000	0.4400	0.2427
0.8929	47.95	68000	0.4382	0.2429
0.8929	48.66	69000	0.4361	0.2415
0.8786	49.37	70000	0.4413	0.2398
0.8786	50.07	71000	0.4392	0.2415
0.8714	50.78	72000	0.4345	0.2406
0.8714	51.48	73000	0.4475	0.2402
0.8589	52.19	74000	0.4473	0.2374
0.8589	52.89	75000	0.4457	0.2357
0.8493	53.6	76000	0.4462	0.2366
0.8493	54.3	77000	0.4494	0.2356
0.8395	55.01	78000	0.4472	0.2352
0.8395	55.71	79000	0.4490	0.2339
0.8295	56.42	80000	0.4489	0.2318
0.8295	57.12	81000	0.4469	0.2320
0.8225	57.83	82000	0.4478	0.2321
0.8225	58.53	83000	0.4525	0.2326
0.816	59.24	84000	0.4532	0.2316
0.816	59.94	85000	0.4502	0.2318

Framework versions

Transformers 4.17.0.dev0
Pytorch 1.10.2+cu102
Datasets 1.18.2.dev0
Tokenizers 0.11.0