Delta-Vector commited on
Commit
c94a471
·
verified ·
1 Parent(s): 14336ca

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +238 -0
README.md ADDED
@@ -0,0 +1,238 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - chat
4
+ - roleplay
5
+ - storywriting
6
+ - llama
7
+ - finetune
8
+ datasets:
9
+ - NewEden/OpenCAI-ShareGPT
10
+ - NewEden/Roleplay-Logs-Sharegpt-Ngram-cleaned
11
+ - HuggingFaceH4/ultrafeedback_binarized
12
+ - NewEden/full-opus-chosen-hermes-rejected-kto-v1-merged
13
+ Language:
14
+ - En
15
+ Pipeline_tag: text-generation
16
+ Base_model: arcee-ai/Llama-3.1-SuperNova-Lite
17
+ Tags:
18
+ - Chat
19
+ ---
20
+
21
+
22
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66c26b6fb01b19d8c3c2467b/6L-SXxQZ2nxYwvIjnlzN8.png)
23
+
24
+
25
+
26
+ *Nanuqsaurus, a polar tyrannosaur, was a cold-adapted apex predator that prowled the Arctic during the Cretaceous, hunting what dared live in the cold nights*
27
+
28
+ A fine-tuned version of LLaMA 3.1 8B Supernova, designed to be "short and sweet" by minimizing narration and lengthy responses. It was fine-tuned over 4 epochs using OpenCAI and RP logs, with DPO applied to enhance coherence. Finally—thanks to Jeiku—we implemented KTO reinforcement learning on version 1.1, significantly improving the model's prose and creativity.
29
+ # Quants
30
+
31
+ GGUF: https://huggingface.co/Delta-Vector/Control-Nanuq-8B-GGUF
32
+
33
+ EXL2 (Thanks Lucy <3) : https://huggingface.co/Delta-Vector/Control-Nanuq-8B
34
+
35
+
36
+ ## Prompting
37
+ Model has been tuned with the LLama-Instruct formatting. A typical input would look like this:
38
+
39
+ ```py
40
+ """<|begin_of_text|><|start_header_id|>system<|end_header_id|>
41
+ You are an AI built to rid the world of bonds and journeys!<|eot_id|><|start_header_id|>user<|end_header_id|>
42
+ Bro i just wanna know what is 2+2?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
43
+ """
44
+
45
+ ```
46
+
47
+ *Also note that ChatML may work as well, and might change how the model feels. while still being coherent and stable*
48
+
49
+ ## System Prompting
50
+
51
+ I would highly recommend using either Euryale's system prompt or the EVA system prompt with the model.
52
+
53
+ <details><summary>See Sao10k's Euryale System Prompt</summary>
54
+
55
+ ```
56
+ Currently, your role is {{char}}, described in detail below. As {{char}}, continue the narrative exchange with {{user}}.
57
+ <Guidelines>
58
+ • Maintain the character persona but allow it to evolve with the story.
59
+ • Be creative and proactive. Drive the story forward, introducing plotlines and events when relevant.
60
+ • All types of outputs are encouraged; respond accordingly to the narrative.
61
+ • Include dialogues, actions, and thoughts in each response.
62
+ • Utilize all five senses to describe scenarios within {{char}}'s dialogue.
63
+ • Use emotional symbols such as "!" and "~" in appropriate contexts.
64
+ • Incorporate onomatopoeia when suitable.
65
+ • Allow time for {{user}} to respond with their own input, respecting their agency.
66
+ • Act as secondary characters and NPCs as needed, and remove them when appropriate.
67
+ • When prompted for an Out of Character [OOC:] reply, answer neutrally and in plaintext, not as {{char}}.
68
+ </Guidelines>
69
+
70
+ <Forbidden>
71
+ • Using excessive literary embellishments and purple prose unless dictated by {{char}}'s persona.
72
+ • Writing for, speaking, thinking, acting, or replying as {{user}} in your response.
73
+ • Repetitive and monotonous outputs.
74
+ • Positivity bias in your replies.
75
+ • Being overly extreme or NSFW when the narrative context is inappropriate.
76
+ </Forbidden>
77
+
78
+ Follow the instructions in <Guidelines></Guidelines>, avoiding the items listed in <Forbidden></Forbidden>.
79
+
80
+ ```
81
+ </details><br>
82
+
83
+ <details><summary>See EVA System Prompt</summary>
84
+
85
+ ```
86
+ A fictional, narrative-driven role-play emphasizing versatility and UNCENSORED content. Adherence to the `Role-playing Guidelines` is mandatory. Refer to the `Role-play Context` for accurate information.\n\n\n
87
+
88
+ <!-- Start of Role-playing Guidelines -->
89
+
90
+ ### Narration
91
+ Concise Descriptions: Keep narration short and to the point, avoiding redundant unnecessary details. Use a dynamic and varied vocabulary for impact.
92
+ Complementary Role: Use narration to complement dialogue and action, not overshadow them.
93
+ Avoid Repetition: Ensure narration does not repeat information already conveyed through dialogue or action.
94
+
95
+ ### Narrative Consistency
96
+ Continuity: Adhere to established story elements, expanding without contradicting previous details.\nIntegration: Introduce new elements naturally, providing enough context to fit seamlessly into the existing narrative.
97
+
98
+ ### Character Embodiment
99
+ Analysis: Examine the context, subtext, and implications of the given information to gain a deeper understandings of the characters'.
100
+ Reflection: Take time to consider the situation, characters' motivations, and potential consequences.
101
+ Authentic Portrayal: Bring characters to life by consistently and realistically portraying their unique traits, thoughts, emotions, appearances, physical sensations, speech patterns, and tone. Ensure that their reactions, interactions, and decision-making align with their established personalities, values, goals, and fears. Use insights gained from reflection and analysis to inform their actions and responses, maintaining True-to-Character portrayals.
102
+
103
+ <!-- End of Role-playing Guidelines -->
104
+
105
+ </details><br>
106
+
107
+ ### Narration
108
+ Concise Descriptions: Keep narration short and to the point, avoiding redundant unnecessary details. Use a dynamic and varied vocabulary for impact.
109
+ Complementary Role: Use narration to complement dialogue and action, not overshadow them.
110
+ Avoid Repetition: Ensure narration does not repeat information already conveyed through dialogue or action.
111
+
112
+ ### Narrative Consistency
113
+ Continuity: Adhere to established story elements, expanding without contradicting previous details.\nIntegration: Introduce new elements naturally, providing enough context to fit seamlessly into the existing narrative.
114
+
115
+ ### Character Embodiment
116
+ Analysis: Examine the context, subtext, and implications of the given information to gain a deeper understandings of the characters'.
117
+ Reflection: Take time to consider the situation, characters' motivations, and potential consequences.
118
+ Authentic Portrayal: Bring characters to life by consistently and realistically portraying their unique traits, thoughts, emotions, appearances, physical sensations, speech patterns, and tone. Ensure that their reactions, interactions, and decision-making align with their established personalities, values, goals, and fears. Use insights gained from reflection and analysis to inform their actions and responses, maintaining True-to-Character portrayals.
119
+
120
+ <!-- End of Role-playing Guidelines -->",
121
+ ```
122
+ </details><br>
123
+
124
+ ## Axolotl config
125
+
126
+ *For previous configs such as the base Axolotl finetune/DPO trainer config, Refer back to the older version of Control*
127
+ <details><summary>See Axolotl KTO Trainer config</summary>
128
+
129
+ ```yaml
130
+ base_model: Delta-Vector/Control-8B-V1.1
131
+ model_type: AutoModelForCausalLM
132
+ tokenizer_type: AutoTokenizer
133
+
134
+ load_in_8bit: false
135
+ load_in_4bit: false
136
+ strict: false
137
+
138
+ hub_model_id: jeiku/controlkto
139
+ hub_strategy: "all_checkpoints"
140
+ push_dataset_to_hub:
141
+ hf_use_auth_token: true
142
+
143
+ chat_template: llama3
144
+
145
+ rl: kto
146
+ rl_beta: 0.2
147
+ kto_desirable_weight: 0.2
148
+
149
+ datasets:
150
+ - path: NewEden/full-opus-chosen-hermes-rejected-kto-v1-merged
151
+ type: llama3.argilla
152
+
153
+ shuffle_merged_datasets: true
154
+ val_set_size: 0.0
155
+ output_dir: ./outputs/out
156
+
157
+ adapter: lora
158
+ lora_model_dir:
159
+
160
+ lora_r: 32
161
+ lora_alpha: 64
162
+ lora_dropout: 0.05
163
+ lora_target_linear: true
164
+ lora_fan_in_fan_out:
165
+
166
+ sequence_len: 8192
167
+ sample_packing: false
168
+ eval_sample_packing: false
169
+ pad_to_sequence_len: false
170
+
171
+ wandb_project: controlkto
172
+ wandb_entity:
173
+ wandb_watch:
174
+ wandb_name: controlkto
175
+ wandb_log_model:
176
+
177
+ gradient_accumulation_steps: 16
178
+ micro_batch_size: 2
179
+ num_epochs: 2
180
+ max_steps: 500
181
+
182
+ optimizer: adamw_8bit
183
+ lr_scheduler: cosine
184
+ learning_rate: 0.0001
185
+ weight_decay: 0.05
186
+
187
+ train_on_inputs: false
188
+ group_by_length: false
189
+ bf16: auto
190
+ fp16:
191
+ tf32: true
192
+
193
+ gradient_checkpointing: true
194
+ gradient_checkpointing_kwargs:
195
+ use_reentrant: true
196
+ remove_unused_columns: false
197
+ early_stopping_patience:
198
+ resume_from_checkpoint:
199
+ local_rank:
200
+ logging_steps: 1
201
+ xformers_attention:
202
+ flash_attention: true
203
+
204
+ warmup_steps: 10
205
+ evals_per_epoch: 2
206
+ eval_table_size:
207
+ eval_max_new_tokens:
208
+ saves_per_epoch: 1
209
+
210
+ debug:
211
+ deepspeed:
212
+ fsdp:
213
+ fsdp_config:
214
+ fsdp:
215
+ fsdp_config:
216
+
217
+ special_tokens:
218
+ pad_token: <|finetune_right_pad_id|>
219
+ eos_token: <|eot_id|>
220
+ ```
221
+
222
+ </details><br>
223
+
224
+ ## Credits
225
+
226
+ Thank you to [Lucy Knada](https://huggingface.co/lucyknada), [jeiku](https://huggingface.co/jeiku), [Intervitens](https://huggingface.co/intervitens), [Kalomaze](https://huggingface.co/kalomaze), [Kubernetes Bad](https://huggingface.co/kubernetes-bad) and the rest of [Anthracite](https://huggingface.co/anthracite-org) (But not Alpin.)
227
+
228
+
229
+ ## Training
230
+ The training was done for 4 epochs. We used 4 x [RTX 3090s](https://www.nvidia.com/en-us/geforce/graphics-cards/30-series/rtx-3090-3090ti/) GPUs graciously provided by [Intervitens](https://huggingface.co/intervitens) for the full-parameter fine-tuning of the model, DPO tuning was on 1 x [Nvidia T4 GPU](https://www.nvidia.com/en-us/data-center/tesla-t4/) and finally KTO was perforaned with 1 x [H100](https://www.nvidia.com/en-us/data-center/h100/) GPU graciosuly provided by jeiku
231
+
232
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
233
+
234
+ [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/made%20with%20unsloth.png" alt="Made with Unsloth" width="200" height="32"/>](https://github.com/unslothai/unsloth)
235
+
236
+ ## Safety
237
+
238
+ Nein.