qgallouedec
/

online-dpo-qwen2-2

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

qgallouedec HF staff commited on Sep 25

Commit

00dc223

•

1 Parent(s): e4af263

Training in progress, epoch 1

Files changed (3) hide show

README.md +1 -2
model.safetensors +1 -1
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -9,7 +9,6 @@ tags:
 model-index:
 - name: online-dpo-qwen2-2
   results: []
-datasets: trl-lib/ultrafeedback-prompt
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -17,7 +16,7 @@ should probably proofread and complete it, then remove this comment. -->
 # online-dpo-qwen2-2
-This model is a fine-tuned version of [Qwen/Qwen2-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct) on the https://huggingface.co/datasets/trl-lib/ultrafeedback-prompt dataset.
 ## Model description

 model-index:
 - name: online-dpo-qwen2-2
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 # online-dpo-qwen2-2
+This model is a fine-tuned version of [Qwen/Qwen2-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct) on the trl-lib/ultrafeedback-prompt dataset.
 ## Model description

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:cc30bd7cffc88b143ac34a77a3fe10ece13020674edc657b48a79f832d0af553
 size 1976163472

 version https://git-lfs.github.com/spec/v1
+oid sha256:444040b9c85172fac0a3f53a2832ab8098ff0841854c638a600dcb57e9311378
 size 1976163472

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e459bdf9c4e6a0da7c2a4e9f5cc66532e6cce964b78dd05d35c5cd8191d60176
 size 5432

 version https://git-lfs.github.com/spec/v1
+oid sha256:4bb16a66c3613e679403f9fa00edfd1ce7eb179a9b342ce70e788e50de1fc7fd
 size 5432