BiancaZYCao/wav2vec2-base-Speech_Emotion_Recognition

This is to deplicate the work of wav2vec2-base-Speech_Emotion_Recognition
Only little changes are made for success run on google colab.

My Version of metrics:

Epoch	Training Loss	Validation Loss	Accuracy	Weighted f1	Micro f1	Macro f1	Weighted recall	Micro recall	Macro recall	Weighted precision	Micro precision	Macro precision
0	1.789200	1.548816	0.382590	0.287415	0.382590	0.289045	0.382590	0.382590	0.379768	0.473585	0.382590	0.467116
1	1.789200	1.302810	0.529823	0.511868	0.529823	0.511619	0.529823	0.529823	0.523766	0.552868	0.529823	0.560496
2	1.789200	1.029921	0.672757	0.668108	0.672757	0.669246	0.672757	0.672757	0.676383	0.674857	0.672757	0.673698
3	1.789200	0.968154	0.677055	0.671986	0.677055	0.674074	0.677055	0.677055	0.676891	0.701300	0.677055	0.705734
4	1.789200	0.850912	0.717894	0.714321	0.717894	0.716527	0.717894	0.717894	0.722476	0.716772	0.717894	0.716698
5	1.789200	0.870916	0.710371	0.706013	0.710371	0.708563	0.710371	0.710371	0.713853	0.710966	0.710371	0.712245
6	1.789200	0.827148	0.729178	0.725336	0.729178	0.726744	0.729178	0.729178	0.732127	0.735935	0.729178	0.736041
7	1.789200	0.798354	0.729715	0.727086	0.729715	0.728847	0.729715	0.729715	0.732476	0.729932	0.729715	0.730688
8	1.789200	0.799373	0.735626	0.732981	0.735626	0.735058	0.735626	0.735626	0.738147	0.741482	0.735626	0.742782
9	1.789200	0.810692	0.728103	0.724754	0.728103	0.726852	0.728103	0.728103	0.731083	0.731919	0.728103	0.732869

  Num examples = 1861 Batch size = 32 [59/59 08:38]
{'eval_loss': 0.8106924891471863,
 'eval_accuracy': 0.7281031703385277,
 'eval_Weighted F1': 0.7247543780750472,
 'eval_Micro F1': 0.7281031703385277,
 'eval_Macro F1': 0.7268519957485492,
 'eval_Weighted Recall': 0.7281031703385277,
 'eval_Micro Recall': 0.7281031703385277,
 'eval_Macro Recall': 0.7310833557439055,
 'eval_Weighted Precision': 0.7319188411210771,
 'eval_Micro Precision': 0.7281031703385277,
 'eval_Macro Precision': 0.732869407033253,
 'eval_runtime': 83.3066,
 'eval_samples_per_second': 22.339,
 'eval_steps_per_second': 0.708,
 'epoch': 9.98}

Model description

This model predicts the emotion of the person speaking in the audio sample.

For more information on how it was created, check out the following link: https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/tree/main/Audio-Projects/Emotion%20Detection/Speech%20Emotion%20Detection

Training and evaluation data

Dataset Source: https://www.kaggle.com/datasets/dmitrybabko/speech-emotion-recognition-en