metadata

license: llama3
base_model: meta-llama/Meta-Llama-3-8B
tags:
  - trl
  - reward-trainer
  - generated_from_trainer
metrics:
  - accuracy
model-index:
  - name: rm_llama3_8B_helpsteer2
    results: []

rm_llama3_8B_helpsteer2

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.1203
Accuracy: 0.6339

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 32
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 10
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
0.8471	0.1572	50	0.7326	0.5819
0.7455	0.3145	100	0.6821	0.5549
0.7059	0.4717	150	0.6642	0.6050
0.6926	0.6289	200	0.6707	0.5915
0.6683	0.7862	250	0.6506	0.6320
0.6727	0.9434	300	0.6456	0.6224
0.629	1.1006	350	0.6218	0.6551
0.5446	1.2579	400	0.6604	0.6281
0.5377	1.4151	450	0.6345	0.6455
0.5555	1.5723	500	0.6145	0.6320
0.5645	1.7296	550	0.6178	0.6474
0.5392	1.8868	600	0.6323	0.6532
0.4505	2.0440	650	0.7539	0.6455
0.1406	2.2013	700	1.0884	0.6339
0.1487	2.3585	750	1.1136	0.6339
0.1493	2.5157	800	1.1202	0.6358
0.1259	2.6730	850	1.1253	0.6320
0.1382	2.8302	900	1.1189	0.6320
0.1448	2.9874	950	1.1203	0.6339

Framework versions

Transformers 4.43.4
Pytorch 2.2.1+cu121
Datasets 2.19.2
Tokenizers 0.19.1