NeuralNovel
/

Senzu-7B-v0.1-DPO

@@ -1,4 +1,6 @@
 ---
 tags:
 - generated_from_trainer
 model-index:
@@ -9,12 +11,23 @@ model-index:
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
-<details><summary>See axolotl config</summary>
-axolotl version: `0.4.0`
 ```yaml
-base_model: out/Mistral-DPO
 model_type: AutoModelForCausalLM
 tokenizer_type: AutoTokenizer
 is_mistral_derived_model: true
@@ -23,14 +36,28 @@ load_in_8bit: false
 load_in_4bit: false
 strict: false
-rl: dpo
 datasets:
-  - path: NeuralNovel/Neural-DPO
-    type: chatml.intel
-    split: train
     format: "[INST] {instruction} [/INST]"
     no_input_format: "[INST] {instruction} [/INST]"
 dataset_prepared_path:
 val_set_size: 0.05
 output_dir: ./out
@@ -48,7 +75,7 @@ wandb_log_model:
 gradient_accumulation_steps: 4
 micro_batch_size: 2
-num_epochs: 6
 optimizer: adamw_bnb_8bit
 lr_scheduler: cosine
 learning_rate: 0.000005
@@ -84,42 +111,28 @@ special_tokens:
 ```
-</details><br>
-# out
-This model was trained from scratch on an unknown dataset.
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:
 - learning_rate: 5e-06
 - train_batch_size: 2
-- eval_batch_size: 8
 - seed: 42
 - gradient_accumulation_steps: 4
 - total_train_batch_size: 8
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 10
-- training_steps: 801
 ### Training results
 ### Framework versions

 ---
+license: apache-2.0
+base_model: mistralai/Mistral-7B-v0.1
 tags:
 - generated_from_trainer
 model-index:
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/645cfe4603fc86c46b3e46d1/FXt-g2q8JE-l77_gp23T3.jpeg)
+# NeuralNovel/Senzu-7B-v0.1
+Embracing a quiet *storm* ..
+## Model Details
+This model is a full parameter fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
+Trained on the Neural-DPO, metamath_gsm8k and RPGPT_PublicDomain-alpaca dataset.
+This model excels at character roleplay, also with the ability of responding accurately to a wide variety of complex questions.
 ```yaml
+base_model: mistralai/Mistral-7B-v0.1
 model_type: AutoModelForCausalLM
 tokenizer_type: AutoTokenizer
 is_mistral_derived_model: true
 load_in_4bit: false
 strict: false
 datasets:
+  - path: practical-dreamer/RPGPT_PublicDomain-alpaca
+    type: alpaca
+    format: "[INST] {instruction} [/INST]"
+    no_input_format: "[INST] {instruction} [/INST]"
+datasets:
+  - path: shuyuej/metamath_gsm8k
+    type: jeopardy
     format: "[INST] {instruction} [/INST]"
     no_input_format: "[INST] {instruction} [/INST]"
+datasets:
+  - path: NeuralNovel/Neural-DPO
+    type:
+      system_prompt: ""
+      field_system: system
+      field_instruction: chosen
+      field_output: chosen
+      format: "[INST] {instruction} [/INST]"
+      no_input_format: "[INST] {instruction} [/INST]"
 dataset_prepared_path:
 val_set_size: 0.05
 output_dir: ./out
 gradient_accumulation_steps: 4
 micro_batch_size: 2
+num_epochs: 1
 optimizer: adamw_bnb_8bit
 lr_scheduler: cosine
 learning_rate: 0.000005
 ```
 ### Training hyperparameters
 The following hyperparameters were used during training:
 - learning_rate: 5e-06
 - train_batch_size: 2
+- eval_batch_size: 2
 - seed: 42
 - gradient_accumulation_steps: 4
 - total_train_batch_size: 8
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 10
+- num_epochs: 1
 ### Training results
+| Training Loss | Epoch | Step | Validation Loss |
+|:-------------:|:-----:|:----:|:---------------:|
+| 0.2061        | 0.01  | 1    | 0.3139          |
+| 0.0           | 0.25  | 32   | 0.0000          |
+| 0.0           | 0.5   | 64   | 0.0010          |
+| 0.0           | 0.76  | 96   | 0.0000          |
 ### Framework versions