Shahradmz
/

Qwen2-0.5B-Instruct_continual_data_debug_PPO_1

Generated from Trainer

Model card Files Files and versions Community

Shahradmz commited on Apr 28

Commit

933c888

·

verified ·

1 Parent(s): 6d1484e

End of training

Files changed (3) hide show

README.md +1 -1
all_results.json +1 -1
eval_results.json +1 -1

README.md CHANGED Viewed

@@ -26,7 +26,7 @@ print(output["generated_text"])
 ## Training procedure
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/shahrad_m/AIFGen-ppo-continual-test/runs/dpikcnp4)
 This model was trained with PPO, a method introduced in [Fine-Tuning Language Models from Human Preferences](https://huggingface.co/papers/1909.08593).

 ## Training procedure
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/shahrad_m/AIFGen-ppo-continual-test/runs/ufysmsjb)
 This model was trained with PPO, a method introduced in [Fine-Tuning Language Models from Human Preferences](https://huggingface.co/papers/1909.08593).

all_results.json CHANGED Viewed

@@ -1,4 +1,4 @@
 {
     "dataset": 1,
-    "eval_score": 1.1290146112442017
 }

 {
     "dataset": 1,
+    "eval_score": 1.718251347541809
 }

eval_results.json CHANGED Viewed

@@ -1,4 +1,4 @@
 {
     "dataset": 1,
-    "eval_score": 1.1290146112442017
 }

 {
     "dataset": 1,
+    "eval_score": 1.718251347541809
 }