RyanYr commited on
Commit
7cbbb6c
1 Parent(s): ddf28b9

Model save

Browse files
README.md CHANGED
@@ -1,7 +1,7 @@
1
  ---
2
- base_model: RyanYr/self-correct_mistral-small-it_mMQA_dpo_iter2
3
  library_name: transformers
4
- model_name: self-correct_mistral-small-it_mMQA_dpo_iter3
5
  tags:
6
  - generated_from_trainer
7
  - trl
@@ -9,9 +9,9 @@ tags:
9
  licence: license
10
  ---
11
 
12
- # Model Card for self-correct_mistral-small-it_mMQA_dpo_iter3
13
 
14
- This model is a fine-tuned version of [RyanYr/self-correct_mistral-small-it_mMQA_dpo_iter2](https://huggingface.co/RyanYr/self-correct_mistral-small-it_mMQA_dpo_iter2).
15
  It has been trained using [TRL](https://github.com/huggingface/trl).
16
 
17
  ## Quick start
@@ -20,14 +20,14 @@ It has been trained using [TRL](https://github.com/huggingface/trl).
20
  from transformers import pipeline
21
 
22
  question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
23
- generator = pipeline("text-generation", model="RyanYr/self-correct_mistral-small-it_mMQA_dpo_iter3", device="cuda")
24
  output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
25
  print(output["generated_text"])
26
  ```
27
 
28
  ## Training procedure
29
 
30
- [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/yyr/huggingface/runs/3nyg0g8s)
31
 
32
  This model was trained with DPO, a method introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://huggingface.co/papers/2305.18290).
33
 
 
1
  ---
2
+ base_model: RyanYr/self-correct_mistral-small-it_mMQA_dpo_iter1
3
  library_name: transformers
4
+ model_name: self-correct_mistral-small-it_mMQA_dpo_iter2_ref-iter1
5
  tags:
6
  - generated_from_trainer
7
  - trl
 
9
  licence: license
10
  ---
11
 
12
+ # Model Card for self-correct_mistral-small-it_mMQA_dpo_iter2_ref-iter1
13
 
14
+ This model is a fine-tuned version of [RyanYr/self-correct_mistral-small-it_mMQA_dpo_iter1](https://huggingface.co/RyanYr/self-correct_mistral-small-it_mMQA_dpo_iter1).
15
  It has been trained using [TRL](https://github.com/huggingface/trl).
16
 
17
  ## Quick start
 
20
  from transformers import pipeline
21
 
22
  question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
23
+ generator = pipeline("text-generation", model="RyanYr/self-correct_mistral-small-it_mMQA_dpo_iter2_ref-iter1", device="cuda")
24
  output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
25
  print(output["generated_text"])
26
  ```
27
 
28
  ## Training procedure
29
 
30
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/yyr/huggingface/runs/v7iyqyhs)
31
 
32
  This model was trained with DPO, a method introduced in [Direct Preference Optimization: Your Language Model is Secretly a Reward Model](https://huggingface.co/papers/2305.18290).
33
 
last_checkpoint/config.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "_name_or_path": "RyanYr/self-correct_mistral-small-it_mMQA_dpo_iter2",
3
  "architectures": [
4
  "MistralForCausalLM"
5
  ],
 
1
  {
2
+ "_name_or_path": "RyanYr/self-correct_mistral-small-it_mMQA_dpo_iter1",
3
  "architectures": [
4
  "MistralForCausalLM"
5
  ],
last_checkpoint/model-00001-of-00009.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:00fe1bb21888156b09cb0852bc3d4da8392320b3834a1b73a49b220e394813f5
3
  size 4882311064
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5cd193c02a5656c0e5287a59dadcc574a3e02da8b03b7c569fc91b28d2c3cd6f
3
  size 4882311064
last_checkpoint/model-00002-of-00009.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:258480b3307ec83d650be048c1694d12bb70e9110cead3fe0b2986d17a99d19c
3
  size 4983012160
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b280d780d7433a36bbc5674a26913e9bc8129d36abe83de11c70af59297049c0
3
  size 4983012160
last_checkpoint/model-00003-of-00009.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:76eebe466e4989d58a77acededc5cc12bd7463bfdde7c16a4fdfdc38148ccd89
3
  size 4957821336
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:873485e78c41e0b68f0c4e35d89a7976da37b9fd05be8c60608b662f484e90fd
3
  size 4957821336
last_checkpoint/model-00004-of-00009.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:8d2bd6e5e4a5b55a67cab6188d90f3e0c9500638905db3472d19e533a7118690
3
  size 4882323744
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a288396320b22dab7761c435a2b80469ee5fb921e9e63340a9917e0ae7199c4c
3
  size 4882323744
last_checkpoint/model-00005-of-00009.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:8d81f1d814ca73d528def3c26298510713ea6fb80cacc8ca9aca4540bb57495b
3
  size 4983012192
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3da42476985b8196f769e9c2622fc3ff6cb18fa4a0fe74f264949a314e6453cb
3
  size 4983012192
last_checkpoint/model-00006-of-00009.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c5451f251a32de94b088f1a4ec933fc66862b0bcc12391c3fcc814ae61ed0eff
3
  size 4957821336
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:92a9dcecfbc64df07fc725985251c7d77ff01ddf40e78c976560ccaddb655b41
3
  size 4957821336
last_checkpoint/model-00007-of-00009.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e1b8acf4839e0d24873983f507065309fe476c94853513eddf8d37dc88bbd111
3
  size 4882323744
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9322c4eed8ffde8e7e0dff8e23e0f46f1dcfec26629ab8bd161f19c0c6c5b581
3
  size 4882323744
last_checkpoint/model-00008-of-00009.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:8a7b1b8814fad19ec019466fd555896d9beb5ea014b9d8da45c54eb46828a2bf
3
  size 4983012192
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:98d34e4b6504b2bc96f0c91c6e9fe99852f220a234c713c0ef96e268508ce2fc
3
  size 4983012192
last_checkpoint/model-00009-of-00009.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1c5ef663d3b7080f9699410566f57fbc08a5277a6f745fe32c226aae51b0f7fc
3
  size 4983011344
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3311b1902f6785073b28612644caf873a8626f257000e8f9d7014da37271116b
3
  size 4983011344
last_checkpoint/training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:275dbb49bf52c523a077745889de2c4f134ad827e06aa7db4c2090c7f1f19452
3
- size 7992
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:df47e9f8c5a16a9a09907e8106037fb79a44b429595bed66e27edd5e0695b632
3
+ size 8056