Model save

Browse files

Files changed (12) hide show

README.md +6 -6
last_checkpoint/config.json +1 -1
last_checkpoint/model-00001-of-00009.safetensors +1 -1
last_checkpoint/model-00002-of-00009.safetensors +1 -1
last_checkpoint/model-00003-of-00009.safetensors +1 -1
last_checkpoint/model-00004-of-00009.safetensors +1 -1
last_checkpoint/model-00005-of-00009.safetensors +1 -1
last_checkpoint/model-00006-of-00009.safetensors +1 -1
last_checkpoint/model-00007-of-00009.safetensors +1 -1
last_checkpoint/model-00008-of-00009.safetensors +1 -1
last_checkpoint/model-00009-of-00009.safetensors +1 -1
last_checkpoint/training_args.bin +2 -2

README.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
-base_model: RyanYr/self-correct_mistral-small-it_mMQA_dpo_iter2
 library_name: transformers
-model_name: self-correct_mistral-small-it_mMQA_dpo_iter3
 tags:
 - generated_from_trainer
 - trl
@@ -9,9 +9,9 @@ tags:
 licence: license
 ---
-# Model Card for self-correct_mistral-small-it_mMQA_dpo_iter3
-This model is a fine-tuned version of [RyanYr/self-correct_mistral-small-it_mMQA_dpo_iter2](https://huggingface.co/RyanYr/self-correct_mistral-small-it_mMQA_dpo_iter2).
 It has been trained using [TRL](https://github.com/huggingface/trl).
 ## Quick start
@@ -20,14 +20,14 @@ It has been trained using [TRL](https://github.com/huggingface/trl).
 from transformers import pipeline
 question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
-generator = pipeline("text-generation", model="RyanYr/self-correct_mistral-small-it_mMQA_dpo_iter3", device="cuda")
 output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
 print(output["generated_text"])
 ```
 ## Training procedure
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/yyr/huggingface/runs/3nyg0g8s)
 This model was trained with DPO, a method introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://huggingface.co/papers/2305.18290).

 ---
+base_model: RyanYr/self-correct_mistral-small-it_mMQA_dpo_iter1
 library_name: transformers
+model_name: self-correct_mistral-small-it_mMQA_dpo_iter2_ref-iter1
 tags:
 - generated_from_trainer
 - trl
 licence: license
 ---
+# Model Card for self-correct_mistral-small-it_mMQA_dpo_iter2_ref-iter1
+This model is a fine-tuned version of [RyanYr/self-correct_mistral-small-it_mMQA_dpo_iter1](https://huggingface.co/RyanYr/self-correct_mistral-small-it_mMQA_dpo_iter1).
 It has been trained using [TRL](https://github.com/huggingface/trl).
 ## Quick start
 from transformers import pipeline
 question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
+generator = pipeline("text-generation", model="RyanYr/self-correct_mistral-small-it_mMQA_dpo_iter2_ref-iter1", device="cuda")
 output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
 print(output["generated_text"])
 ```
 ## Training procedure
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/yyr/huggingface/runs/v7iyqyhs)
 This model was trained with DPO, a method introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://huggingface.co/papers/2305.18290).

last_checkpoint/config.json CHANGED Viewed

@@ -1,5 +1,5 @@
 {
-  "_name_or_path": "RyanYr/self-correct_mistral-small-it_mMQA_dpo_iter2",
   "architectures": [
     "MistralForCausalLM"
   ],

 {
+  "_name_or_path": "RyanYr/self-correct_mistral-small-it_mMQA_dpo_iter1",
   "architectures": [
     "MistralForCausalLM"
   ],

last_checkpoint/model-00001-of-00009.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:00fe1bb21888156b09cb0852bc3d4da8392320b3834a1b73a49b220e394813f5
 size 4882311064

 version https://git-lfs.github.com/spec/v1
+oid sha256:5cd193c02a5656c0e5287a59dadcc574a3e02da8b03b7c569fc91b28d2c3cd6f
 size 4882311064

last_checkpoint/model-00002-of-00009.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:258480b3307ec83d650be048c1694d12bb70e9110cead3fe0b2986d17a99d19c
 size 4983012160

 version https://git-lfs.github.com/spec/v1
+oid sha256:b280d780d7433a36bbc5674a26913e9bc8129d36abe83de11c70af59297049c0
 size 4983012160

last_checkpoint/model-00003-of-00009.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:76eebe466e4989d58a77acededc5cc12bd7463bfdde7c16a4fdfdc38148ccd89
 size 4957821336

 version https://git-lfs.github.com/spec/v1
+oid sha256:873485e78c41e0b68f0c4e35d89a7976da37b9fd05be8c60608b662f484e90fd
 size 4957821336

last_checkpoint/model-00004-of-00009.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:8d2bd6e5e4a5b55a67cab6188d90f3e0c9500638905db3472d19e533a7118690
 size 4882323744

 version https://git-lfs.github.com/spec/v1
+oid sha256:a288396320b22dab7761c435a2b80469ee5fb921e9e63340a9917e0ae7199c4c
 size 4882323744

last_checkpoint/model-00005-of-00009.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:8d81f1d814ca73d528def3c26298510713ea6fb80cacc8ca9aca4540bb57495b
 size 4983012192

 version https://git-lfs.github.com/spec/v1
+oid sha256:3da42476985b8196f769e9c2622fc3ff6cb18fa4a0fe74f264949a314e6453cb
 size 4983012192

last_checkpoint/model-00006-of-00009.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c5451f251a32de94b088f1a4ec933fc66862b0bcc12391c3fcc814ae61ed0eff
 size 4957821336

 version https://git-lfs.github.com/spec/v1
+oid sha256:92a9dcecfbc64df07fc725985251c7d77ff01ddf40e78c976560ccaddb655b41
 size 4957821336

last_checkpoint/model-00007-of-00009.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e1b8acf4839e0d24873983f507065309fe476c94853513eddf8d37dc88bbd111
 size 4882323744

 version https://git-lfs.github.com/spec/v1
+oid sha256:9322c4eed8ffde8e7e0dff8e23e0f46f1dcfec26629ab8bd161f19c0c6c5b581
 size 4882323744

last_checkpoint/model-00008-of-00009.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:8a7b1b8814fad19ec019466fd555896d9beb5ea014b9d8da45c54eb46828a2bf
 size 4983012192

 version https://git-lfs.github.com/spec/v1
+oid sha256:98d34e4b6504b2bc96f0c91c6e9fe99852f220a234c713c0ef96e268508ce2fc
 size 4983012192

last_checkpoint/model-00009-of-00009.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:1c5ef663d3b7080f9699410566f57fbc08a5277a6f745fe32c226aae51b0f7fc
 size 4983011344

 version https://git-lfs.github.com/spec/v1
+oid sha256:3311b1902f6785073b28612644caf873a8626f257000e8f9d7014da37271116b
 size 4983011344

last_checkpoint/training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:275dbb49bf52c523a077745889de2c4f134ad827e06aa7db4c2090c7f1f19452
-size 7992

 version https://git-lfs.github.com/spec/v1
+oid sha256:df47e9f8c5a16a9a09907e8106037fb79a44b429595bed66e27edd5e0695b632
+size 8056