mambaformer

This model is a fine-tuned version of OuteAI/Lite-Oute-2-Mamba2Attn-Base on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.1639
Accuracy: 0.9607
Precision: 0.9628
Recall: 0.9607
F1: 0.9613
Auroc: 0.9925

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 32
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 1
label_smoothing_factor: 0.03

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	Precision	Recall	F1	Auroc
0.8973	0.0988	128	0.6661	0.6897	0.6807	0.6897	0.6850	0.5552
0.5525	0.1976	256	0.4682	0.7898	0.7526	0.7898	0.7413	0.7643
0.4086	0.2965	384	0.3500	0.8523	0.8452	0.8523	0.8472	0.9024
0.3067	0.3953	512	0.2573	0.9107	0.9085	0.9107	0.9091	0.9620
0.2477	0.4941	640	0.2234	0.9309	0.9298	0.9309	0.9288	0.9761
0.2283	0.5929	768	0.2074	0.9404	0.9396	0.9404	0.9398	0.9804
0.2035	0.6918	896	0.1875	0.9529	0.9530	0.9529	0.9530	0.9853
0.1963	0.7906	1024	0.1809	0.9464	0.9458	0.9464	0.9460	0.9867
0.1798	0.8894	1152	0.1638	0.9601	0.9610	0.9601	0.9604	0.9900
0.1749	0.9882	1280	0.1652	0.9583	0.9579	0.9583	0.9581	0.9894

Framework versions

Transformers 4.43.0.dev0
Pytorch 2.4.0+cu124
Datasets 2.19.1
Tokenizers 0.19.1

binh230
/

mambaformer

mambaformer

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for binh230/mambaformer

Evaluation results