bikalnetomi
/

rlhf-ppo-llama31-8B-Reward-model-lora-r64-bikal

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

bikalnetomi commited on Nov 29, 2024

Commit

acafd13

·

verified ·

1 Parent(s): 1f0c0a0

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -12,7 +12,7 @@ licence: license
 # Model Card for rlhf-ppo-llama31-8B-Reward-model-lora-r64-bikal
 This model is a fine-tuned version of [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct).
-It has been trained using [TRL](https://github.com/huggingface/trl) with ultrafeedback-Binarized Dataset(trl-lib/ultrafeedback_binarized)
 ## Quick start

 # Model Card for rlhf-ppo-llama31-8B-Reward-model-lora-r64-bikal
 This model is a fine-tuned version of [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct).
+It has been trained using [TRL](https://github.com/huggingface/trl) with [ultrafeedback-Binarized Dataset](trl-lib/ultrafeedback_binarized)
 ## Quick start