RLAIF-V-Dataset

This model is a fine-tuned version of llava-hf/llava-v1.6-mistral-7b-hf on the RLAIF-V-Dataset dataset. It achieves the following results on the evaluation set:

Loss: 0.4467
Rewards/chosen: -3.1988
Rewards/rejected: -5.9606
Rewards/accuracies: 0.8163
Rewards/margins: 2.7618
Logps/rejected: -218.4866
Logps/chosen: -190.4653
Logits/rejected: -2.3732
Logits/chosen: -2.4055

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-06
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 4
total_train_batch_size: 256
total_eval_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 10
num_epochs: 3.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.5777	0.1709	50	0.5813	-0.4541	-1.0668	0.6683	0.6127	-169.5483	-163.0182	-2.5153	-2.5221
0.4982	0.3419	100	0.5161	-0.9806	-2.1974	0.7212	1.2168	-180.8539	-168.2832	-2.4606	-2.4847
0.4954	0.5128	150	0.4770	-1.5352	-3.2803	0.7548	1.7451	-191.6833	-173.8291	-2.0991	-2.1473
0.4567	0.6838	200	0.4598	-1.1951	-2.8406	0.7596	1.6455	-187.2865	-170.4288	-2.1090	-2.1587
0.4873	0.8547	250	0.4487	-1.9205	-3.6640	0.7635	1.7435	-195.5203	-177.6819	-2.5457	-2.5724
0.2176	1.0256	300	0.4383	-1.1991	-3.1202	0.7846	1.9211	-190.0823	-170.4688	-2.3130	-2.3490
0.2095	1.1966	350	0.4537	-2.3545	-4.8732	0.7933	2.5188	-207.6123	-182.0219	-2.3656	-2.3942
0.1952	1.3675	400	0.4353	-1.9722	-4.1870	0.7962	2.2148	-200.7505	-178.1995	-2.3058	-2.3361
0.1819	1.5385	450	0.4321	-2.0466	-4.4416	0.8077	2.3950	-203.2960	-178.9431	-2.2282	-2.2612
0.1932	1.7094	500	0.4247	-1.8597	-4.1324	0.8087	2.2727	-200.2041	-177.0739	-2.2659	-2.2970
0.1921	1.8803	550	0.4131	-2.3219	-4.8505	0.8183	2.5286	-207.3855	-181.6965	-2.3691	-2.3985
0.0868	2.0513	600	0.4392	-2.7792	-5.2414	0.8135	2.4623	-211.2946	-186.2690	-2.4330	-2.4615
0.0825	2.2222	650	0.4447	-3.2209	-6.0852	0.8154	2.8642	-219.7319	-190.6867	-2.3962	-2.4295
0.0925	2.3932	700	0.4449	-3.2092	-6.0685	0.8183	2.8593	-219.5651	-190.5695	-2.3854	-2.4189
0.0754	2.5641	750	0.4567	-3.3570	-6.0710	0.8115	2.7141	-219.5908	-192.0472	-2.3789	-2.4105
0.0707	2.7350	800	0.4484	-3.2447	-6.0070	0.8135	2.7622	-218.9498	-190.9248	-2.3739	-2.4066
0.0739	2.9060	850	0.4468	-3.2032	-5.9670	0.8173	2.7638	-218.5504	-190.5096	-2.3732	-2.4054

Framework versions

Transformers 4.45.2
Pytorch 2.4.0+cu121
Datasets 2.21.0
Tokenizers 0.20.3

htlou
/

mm-interp-RLAIF-V-Dataset

RLAIF-V-Dataset

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for htlou/mm-interp-RLAIF-V-Dataset

Evaluation results