reward_model / README.md

Meta-Llama-3-8B-Instruct-rm-Anthropic-hh-rlhf-concateye

08399bc verified 7 months ago

3.01 kB

	---
	base_model: meta-llama/Meta-Llama-3-8B-Instruct
	library_name: peft
	license: llama3
	metrics:
	- accuracy
	tags:
	- generated_from_trainer
	model-index:
	- name: reward_model
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# reward_model

	This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.7036
	- Accuracy: 0.5236

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 16
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_ratio: 0.02
	- num_epochs: 2

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------:\|
	\| 0.7293 \| 0.08 \| 128 \| 0.7252 \| 0.4850 \|
	\| 0.7412 \| 0.15 \| 256 \| 0.6925 \| 0.5386 \|
	\| 0.7182 \| 0.23 \| 384 \| 0.6954 \| 0.5327 \|
	\| 0.6997 \| 0.3 \| 512 \| 0.6941 \| 0.5277 \|
	\| 0.7547 \| 0.38 \| 640 \| 0.6959 \| 0.5279 \|
	\| 0.7123 \| 0.45 \| 768 \| 0.6993 \| 0.5252 \|
	\| 0.7281 \| 0.53 \| 896 \| 0.6962 \| 0.5275 \|
	\| 0.7169 \| 0.6 \| 1024 \| 0.6986 \| 0.5156 \|
	\| 0.7244 \| 0.68 \| 1152 \| 0.6981 \| 0.5125 \|
	\| 0.7199 \| 0.75 \| 1280 \| 0.7000 \| 0.5060 \|
	\| 0.7311 \| 0.83 \| 1408 \| 0.6959 \| 0.5140 \|
	\| 0.7123 \| 0.9 \| 1536 \| 0.6956 \| 0.5154 \|
	\| 0.7344 \| 0.98 \| 1664 \| 0.6970 \| 0.5100 \|
	\| 0.7105 \| 1.05 \| 1792 \| 0.6933 \| 0.5219 \|
	\| 0.6947 \| 1.13 \| 1920 \| 0.6944 \| 0.5259 \|
	\| 0.7261 \| 1.21 \| 2048 \| 0.6960 \| 0.5256 \|
	\| 0.6997 \| 1.28 \| 2176 \| 0.6974 \| 0.5188 \|
	\| 0.7442 \| 1.36 \| 2304 \| 0.6960 \| 0.5163 \|
	\| 0.7004 \| 1.43 \| 2432 \| 0.6987 \| 0.5286 \|
	\| 0.7089 \| 1.51 \| 2560 \| 0.6982 \| 0.5288 \|
	\| 0.7142 \| 1.58 \| 2688 \| 0.7014 \| 0.5154 \|
	\| 0.7364 \| 1.66 \| 2816 \| 0.6997 \| 0.5202 \|
	\| 0.6915 \| 1.73 \| 2944 \| 0.7043 \| 0.5200 \|
	\| 0.7322 \| 1.81 \| 3072 \| 0.7037 \| 0.5229 \|
	\| 0.7524 \| 1.88 \| 3200 \| 0.7019 \| 0.5219 \|
	\| 0.7192 \| 1.96 \| 3328 \| 0.7036 \| 0.5236 \|


	### Framework versions

	- PEFT 0.12.0
	- Transformers 4.36.0
	- Pytorch 2.2.0
	- Datasets 2.20.0
	- Tokenizers 0.15.2