BiancaZYCao
/

wav2vec2-base-Speech_Emotion_Recognition

Audio Classification

Inference Endpoints

Model card Files Files and versions Community

wav2vec2-base-Speech_Emotion_Recognition / README.md

BiancaZYCao's picture

Create README.md

75f3f10 verified 11 months ago

|

history blame contribute delete

2.96 kB

	This is to deplicate the work of
	[wav2vec2-base-Speech_Emotion_Recognition](https://huggingface.co/DunnBC22/wav2vec2-base-Speech_Emotion_Recognition)
	Only little changes are made for success run on google colab.

	### My Version of metrics:
	\|Epoch \|Training Loss \|Validation Loss \|Accuracy \|Weighted f1 \|Micro f1 \|Macro f1 \|Weighted recall \|Micro recall \|Macro recall \|Weighted precision \|Micro precision \|Macro precision
	\| ----\|----\|----\|----\|----\|----\|----\|----\|----\|----\|----\|----\|----\|
	\|0 \| 1.789200 \| 1.548816 \| 0.382590 \| 0.287415 \| 0.382590 \| 0.289045 \| 0.382590 \| 0.382590 \| 0.379768 \| 0.473585 \| 0.382590 \| 0.467116 \|
	\|1 \| 1.789200 \| 1.302810 \| 0.529823 \| 0.511868 \| 0.529823 \| 0.511619 \| 0.529823 \| 0.529823 \| 0.523766 \| 0.552868 \| 0.529823 \| 0.560496 \|
	\|2 \| 1.789200 \| 1.029921 \| 0.672757 \| 0.668108 \| 0.672757 \| 0.669246 \| 0.672757 \| 0.672757 \| 0.676383 \| 0.674857 \| 0.672757 \| 0.673698 \|
	\|3 \| 1.789200 \| 0.968154 \| 0.677055 \| 0.671986 \| 0.677055 \| 0.674074 \| 0.677055 \| 0.677055 \| 0.676891 \| 0.701300 \| 0.677055 \| 0.705734 \|
	\|4 \| 1.789200 \| 0.850912 \| 0.717894 \| 0.714321 \| 0.717894 \| 0.716527 \| 0.717894 \| 0.717894 \| 0.722476 \| 0.716772 \| 0.717894 \| 0.716698 \|
	\|5 \| 1.789200 \| 0.870916 \| 0.710371 \| 0.706013 \| 0.710371 \| 0.708563 \| 0.710371 \| 0.710371 \| 0.713853 \| 0.710966 \| 0.710371 \| 0.712245 \|
	\|6 \| 1.789200 \| 0.827148 \| 0.729178 \| 0.725336 \| 0.729178 \| 0.726744 \| 0.729178 \| 0.729178 \| 0.732127 \| 0.735935 \| 0.729178 \| 0.736041 \|
	\|7 \| 1.789200 \| 0.798354 \| 0.729715 \| 0.727086 \| 0.729715 \| 0.728847 \| 0.729715 \| 0.729715 \| 0.732476 \| 0.729932 \| 0.729715 \| 0.730688 \|
	\|8 \| 1.789200 \| 0.799373 \| 0.735626 \| 0.732981 \| 0.735626 \| 0.735058 \| 0.735626 \| 0.735626 \| 0.738147 \| 0.741482 \| 0.735626 \| 0.742782 \|
	\|9 \| 1.789200 \| 0.810692 \| 0.728103 \| 0.724754 \| 0.728103 \| 0.726852 \| 0.728103 \| 0.728103 \| 0.731083 \| 0.731919 \| 0.728103 \| 0.732869 \|


	```*** Running Evaluation ***
	Num examples = 1861 Batch size = 32 [59/59 08:38]
	{'eval_loss': 0.8106924891471863,
	'eval_accuracy': 0.7281031703385277,
	'eval_Weighted F1': 0.7247543780750472,
	'eval_Micro F1': 0.7281031703385277,
	'eval_Macro F1': 0.7268519957485492,
	'eval_Weighted Recall': 0.7281031703385277,
	'eval_Micro Recall': 0.7281031703385277,
	'eval_Macro Recall': 0.7310833557439055,
	'eval_Weighted Precision': 0.7319188411210771,
	'eval_Micro Precision': 0.7281031703385277,
	'eval_Macro Precision': 0.732869407033253,
	'eval_runtime': 83.3066,
	'eval_samples_per_second': 22.339,
	'eval_steps_per_second': 0.708,
	'epoch': 9.98}
	```

	### Model description

	This model predicts the emotion of the person speaking in the audio sample.

	For more information on how it was created, check out the following link: https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/tree/main/Audio-Projects/Emotion%20Detection/Speech%20Emotion%20Detection


	### Training and evaluation data

	Dataset Source: https://www.kaggle.com/datasets/dmitrybabko/speech-emotion-recognition-en