RyanYr commited on
Commit
394a115
1 Parent(s): eb8b94e

Model save

Browse files
README.md CHANGED
@@ -1,7 +1,7 @@
1
  ---
2
- base_model: mistralai/Mistral-Small-Instruct-2409
3
  library_name: transformers
4
- model_name: self-correct_mistral-small-it_mMQA_dpo_iter1
5
  tags:
6
  - generated_from_trainer
7
  - trl
@@ -9,9 +9,9 @@ tags:
9
  licence: license
10
  ---
11
 
12
- # Model Card for self-correct_mistral-small-it_mMQA_dpo_iter1
13
 
14
- This model is a fine-tuned version of [mistralai/Mistral-Small-Instruct-2409](https://huggingface.co/mistralai/Mistral-Small-Instruct-2409).
15
  It has been trained using [TRL](https://github.com/huggingface/trl).
16
 
17
  ## Quick start
@@ -20,14 +20,14 @@ It has been trained using [TRL](https://github.com/huggingface/trl).
20
  from transformers import pipeline
21
 
22
  question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
23
- generator = pipeline("text-generation", model="RyanYr/self-correct_mistral-small-it_mMQA_dpo_iter1", device="cuda")
24
  output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
25
  print(output["generated_text"])
26
  ```
27
 
28
  ## Training procedure
29
 
30
- [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/yyr/huggingface/runs/y5iah9gz)
31
 
32
  This model was trained with DPO, a method introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://huggingface.co/papers/2305.18290).
33
 
 
1
  ---
2
+ base_model: RyanYr/self-correct_mistral-small-it_mMQA_dpo_iter1
3
  library_name: transformers
4
+ model_name: self-correct_mistral-small-it_mMQA_dpo_iter2
5
  tags:
6
  - generated_from_trainer
7
  - trl
 
9
  licence: license
10
  ---
11
 
12
+ # Model Card for self-correct_mistral-small-it_mMQA_dpo_iter2
13
 
14
+ This model is a fine-tuned version of [RyanYr/self-correct_mistral-small-it_mMQA_dpo_iter1](https://huggingface.co/RyanYr/self-correct_mistral-small-it_mMQA_dpo_iter1).
15
  It has been trained using [TRL](https://github.com/huggingface/trl).
16
 
17
  ## Quick start
 
20
  from transformers import pipeline
21
 
22
  question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
23
+ generator = pipeline("text-generation", model="RyanYr/self-correct_mistral-small-it_mMQA_dpo_iter2", device="cuda")
24
  output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
25
  print(output["generated_text"])
26
  ```
27
 
28
  ## Training procedure
29
 
30
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/yyr/huggingface/runs/gfixme5v)
31
 
32
  This model was trained with DPO, a method introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://huggingface.co/papers/2305.18290).
33
 
last_checkpoint/config.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "_name_or_path": "mistralai/Mistral-Small-Instruct-2409",
3
  "architectures": [
4
  "MistralForCausalLM"
5
  ],
 
1
  {
2
+ "_name_or_path": "RyanYr/self-correct_mistral-small-it_mMQA_dpo_iter1",
3
  "architectures": [
4
  "MistralForCausalLM"
5
  ],
last_checkpoint/model-00001-of-00009.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:aeed4aa6b7c17fcffac0032e984eb89780fafaf36ec0c2e1dada35fcddb3e8b3
3
  size 4882311064
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:431f1be976a33a5d52716733a8a9c6ba5df3d62a0ebaee82ffb8362bfdde9f5c
3
  size 4882311064
last_checkpoint/model-00002-of-00009.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1ff18a680bbf93e0138772fe4b8f788e6c222638240878a32efb24ea1821df9e
3
  size 4983012160
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d1c1b7d9e99d48c95d806813a78258a831d9da8ef899fa41e35ba0bfdceb8806
3
  size 4983012160
last_checkpoint/model-00003-of-00009.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f6c2f03042fd3ae5f5fe1f921890151b7e18da6620f2a0e18bc497c9d0b46685
3
  size 4957821336
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1b8d52900af74c984f330d7026c69395a1be24e691312ac577f8ed0158942671
3
  size 4957821336
last_checkpoint/model-00004-of-00009.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c6c5091cd089808cd9f01df1f3d9646104191c1442fded7fd62579ecdd578100
3
  size 4882323744
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bc1cdf7982b3633fc557ad92589aa157da82578c0096d61c1c8b84a42aade052
3
  size 4882323744
last_checkpoint/model-00005-of-00009.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b0fe3e3dfa53e31a422dac36b8b90434b506421979a1e4deb6f101646f50b082
3
  size 4983012192
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:13c153bd710e16a67e7b4b610cd1880b582daebcede35f4c996fc5344600cfbf
3
  size 4983012192
last_checkpoint/model-00006-of-00009.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:aba801e1ba41c5cb2f6a84853f6c4c815ac432c7254bfa3759fa0fad06023c9f
3
  size 4957821336
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5e9fc9c940d40dc83b86f47523320d042697b4a7b0d52e0745001c242e8bed3b
3
  size 4957821336
last_checkpoint/model-00007-of-00009.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:29235cde971d316703d9b7e990841680f877ab1aee6ac21f6606635e959ec56e
3
  size 4882323744
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:88713acec4c4392ec0444e8933c449118b6c702f92982ba9a8e9e6af77eea46a
3
  size 4882323744
last_checkpoint/model-00008-of-00009.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4f5177d619b2d5e51c5bd7d5aafb8f75133dc00e0a8258b453755437b4065d5e
3
  size 4983012192
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a1b022ad8cf464d383238aa9de11bf2a0f8ec7c3a18b14574716c428d3aa399c
3
  size 4983012192
last_checkpoint/model-00009-of-00009.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:eb42b681a18a7add5077ba0332928909f0d6933d1412b16c66702509dde7aefc
3
  size 4983011344
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4829e5aded6e120f6f81339907b86534b6ce54562816c69c730c78a611b2ed45
3
  size 4983011344
last_checkpoint/training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:26b7969b3af7288627f8b855398c93e98127d8f74fc9deec993b0ec8c6ec6cdc
3
  size 7992
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b38d385f174117a5f05346643acdb5c08276ca9ac27d00272d8db377debd4d21
3
  size 7992