Model save

Files changed (12) hide show

README.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
-base_model: mistralai/Mistral-Small-Instruct-2409
 library_name: transformers
-model_name: self-correct_mistral-small-it_mMQA_dpo_iter1
 tags:
 - generated_from_trainer
 - trl
@@ -9,9 +9,9 @@ tags:
 licence: license
 ---
-# Model Card for self-correct_mistral-small-it_mMQA_dpo_iter1
-This model is a fine-tuned version of [mistralai/Mistral-Small-Instruct-2409](https://huggingface.co/mistralai/Mistral-Small-Instruct-2409).
 It has been trained using [TRL](https://github.com/huggingface/trl).
 ## Quick start
@@ -20,14 +20,14 @@ It has been trained using [TRL](https://github.com/huggingface/trl).
 from transformers import pipeline
 question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
-generator = pipeline("text-generation", model="RyanYr/self-correct_mistral-small-it_mMQA_dpo_iter1", device="cuda")
 output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
 print(output["generated_text"])
 ```
 ## Training procedure
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/yyr/huggingface/runs/y5iah9gz)
 This model was trained with DPO, a method introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://huggingface.co/papers/2305.18290).

 ---
+base_model: RyanYr/self-correct_mistral-small-it_mMQA_dpo_iter1
 library_name: transformers
+model_name: self-correct_mistral-small-it_mMQA_dpo_iter2
 tags:
 - generated_from_trainer
 - trl
 licence: license
 ---
+# Model Card for self-correct_mistral-small-it_mMQA_dpo_iter2
+This model is a fine-tuned version of [RyanYr/self-correct_mistral-small-it_mMQA_dpo_iter1](https://huggingface.co/RyanYr/self-correct_mistral-small-it_mMQA_dpo_iter1).
 It has been trained using [TRL](https://github.com/huggingface/trl).
 ## Quick start
 from transformers import pipeline
 question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
+generator = pipeline("text-generation", model="RyanYr/self-correct_mistral-small-it_mMQA_dpo_iter2", device="cuda")
 output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
 print(output["generated_text"])
 ```
 ## Training procedure
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/yyr/huggingface/runs/gfixme5v)
 This model was trained with DPO, a method introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://huggingface.co/papers/2305.18290).

last_checkpoint/config.json CHANGED Viewed

@@ -1,5 +1,5 @@
 {
-  "_name_or_path": "mistralai/Mistral-Small-Instruct-2409",
   "architectures": [
     "MistralForCausalLM"
   ],

 {
+  "_name_or_path": "RyanYr/self-correct_mistral-small-it_mMQA_dpo_iter1",
   "architectures": [
     "MistralForCausalLM"
   ],

last_checkpoint/model-00001-of-00009.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:aeed4aa6b7c17fcffac0032e984eb89780fafaf36ec0c2e1dada35fcddb3e8b3
 size 4882311064

 version https://git-lfs.github.com/spec/v1
+oid sha256:431f1be976a33a5d52716733a8a9c6ba5df3d62a0ebaee82ffb8362bfdde9f5c
 size 4882311064

last_checkpoint/model-00002-of-00009.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:1ff18a680bbf93e0138772fe4b8f788e6c222638240878a32efb24ea1821df9e
 size 4983012160

 version https://git-lfs.github.com/spec/v1
+oid sha256:d1c1b7d9e99d48c95d806813a78258a831d9da8ef899fa41e35ba0bfdceb8806
 size 4983012160

last_checkpoint/model-00003-of-00009.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f6c2f03042fd3ae5f5fe1f921890151b7e18da6620f2a0e18bc497c9d0b46685
 size 4957821336

 version https://git-lfs.github.com/spec/v1
+oid sha256:1b8d52900af74c984f330d7026c69395a1be24e691312ac577f8ed0158942671
 size 4957821336

last_checkpoint/model-00004-of-00009.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c6c5091cd089808cd9f01df1f3d9646104191c1442fded7fd62579ecdd578100
 size 4882323744

 version https://git-lfs.github.com/spec/v1
+oid sha256:bc1cdf7982b3633fc557ad92589aa157da82578c0096d61c1c8b84a42aade052
 size 4882323744

last_checkpoint/model-00005-of-00009.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b0fe3e3dfa53e31a422dac36b8b90434b506421979a1e4deb6f101646f50b082
 size 4983012192

 version https://git-lfs.github.com/spec/v1
+oid sha256:13c153bd710e16a67e7b4b610cd1880b582daebcede35f4c996fc5344600cfbf
 size 4983012192

last_checkpoint/model-00006-of-00009.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:aba801e1ba41c5cb2f6a84853f6c4c815ac432c7254bfa3759fa0fad06023c9f
 size 4957821336

 version https://git-lfs.github.com/spec/v1
+oid sha256:5e9fc9c940d40dc83b86f47523320d042697b4a7b0d52e0745001c242e8bed3b
 size 4957821336

last_checkpoint/model-00007-of-00009.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:29235cde971d316703d9b7e990841680f877ab1aee6ac21f6606635e959ec56e
 size 4882323744

 version https://git-lfs.github.com/spec/v1
+oid sha256:88713acec4c4392ec0444e8933c449118b6c702f92982ba9a8e9e6af77eea46a
 size 4882323744

last_checkpoint/model-00008-of-00009.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:4f5177d619b2d5e51c5bd7d5aafb8f75133dc00e0a8258b453755437b4065d5e
 size 4983012192

 version https://git-lfs.github.com/spec/v1
+oid sha256:a1b022ad8cf464d383238aa9de11bf2a0f8ec7c3a18b14574716c428d3aa399c
 size 4983012192

last_checkpoint/model-00009-of-00009.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:eb42b681a18a7add5077ba0332928909f0d6933d1412b16c66702509dde7aefc
 size 4983011344

 version https://git-lfs.github.com/spec/v1
+oid sha256:4829e5aded6e120f6f81339907b86534b6ce54562816c69c730c78a611b2ed45
 size 4983011344

last_checkpoint/training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:26b7969b3af7288627f8b855398c93e98127d8f74fc9deec993b0ec8c6ec6cdc
 size 7992

 version https://git-lfs.github.com/spec/v1
+oid sha256:b38d385f174117a5f05346643acdb5c08276ca9ac27d00272d8db377debd4d21
 size 7992