qgallouedec
/

online-dpo-qwen2-4

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

qgallouedec HF staff commited on Sep 25, 2024

Commit

044170d

·

verified ·

1 Parent(s): 161ccdd

End of training

Files changed (1) hide show

README.md +1 -2

README.md CHANGED Viewed

@@ -28,8 +28,7 @@ print(output["generated_text"][1]["content"])
 ## Training procedure
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="100" height="16"/>](https://wandb.ai/costa-huang/huggingface/runs/and6d28l)
 This model was trained with Online DPO, a method introduced in [Direct Language Model Alignment from Online AI Feedback](https://huggingface.co/papers/2402.04792).

 ## Training procedure
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/huggingface/huggingface/runs/8q6fzgyf)
 This model was trained with Online DPO, a method introduced in [Direct Language Model Alignment from Online AI Feedback](https://huggingface.co/papers/2402.04792).