File size: 4,567 Bytes
01ec086
 
 
864a556
0b19431
864a556
f13056e
b00d54f
 
 
 
 
 
 
 
 
 
f13056e
864a556
f13056e
0b19431
 
 
 
864a556
 
0b19431
 
864a556
 
f13056e
0b19431
 
 
864a556
 
 
 
 
 
 
 
 
 
 
 
 
f13056e
864a556
 
1e814ab
d459653
1e814ab
864a556
1e814ab
864a556
f13056e
 
1e814ab
f13056e
1e814ab
f13056e
 
864a556
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
---
license: apache-2.0
---
Based on Meta-Llama-3-8b-Instruct, and is governed by Meta Llama 3 License agreement:
https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct


We don't know how good this model is exactly in benchmarks since we have not benched this yet, but we think real prompts and usage is more telling anyways. 


From our testing this model is:

- Less Refusals
- More Uncensored
- Follows requests better
- Can reply in requested formats better without adding unnecesary information

We are happy for anyone to try it out and give some feedback.


Training:
- 2048 sequence length, while the base model is 8192 sequence length. From testing it still performs the same 8192 context just fine.
- Trained on a modified and improved version of Cognitive Computations Eric Hartford's Dolphin dataset. https://huggingface.co/datasets/cognitivecomputations/dolphin
- Training duration is around 2 days on 2x RTX3090 on our own machine, using 4-bit loading and Qlora 64-rank 128-alpha resulting in ~2% trainable weights.


The goal for this model is to have the model less-censored and great at general tasks like the previous dolphin based models by Eric Hartford.
We started training this BEFORE they launched their own full weight trained Llama-3-8B-Dolphin-2.9 with their own curated datasets and the newer "Dolphin 2.9" dataset, but we think this model is still a unique take on Llama 3 8B Instruct and the dolphin dataset.
https://huggingface.co/cognitivecomputations/dolphin-2.9-llama3-8b


The difference with their dolphin 2.9 model is that we train this using Meta's new Llama 3 instruct format and not the regular ChatML format that Dolphin models are usually trained on. 
This is because we think that it performed better using the format it was originally trained on.

Instruct format:
```
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{{ system_prompt }}<|eot_id|><|start_header_id|>user<|end_header_id|>

{{ user_message_1 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

{{ model_answer_1 }}<|eot_id|><|start_header_id|>user<|end_header_id|>

{{ user_message_2 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
```


Quants:

AWQ: https://huggingface.co/OwenArli/ArliAI-Llama-3-8B-Instruct-Dolfin-AWQ

GGUF: https://huggingface.co/OwenArli/ArliAI-Llama-3-8B-Instruct-Dolfin-v0.1-GGUF

FP16: https://huggingface.co/OwenArli/ArliAI-Llama-3-8B-Instruct-Dolfin

Exllamav2:

4bpw: https://huggingface.co/OwenArli/ArliAI-Llama-3-8B-Instruct-Dolfin-v0.1-exl2-h8-4bpw-exl2

8bpw: https://huggingface.co/OwenArli/ArliAI-Llama-3-8B-Instruct-Dolfin-v0.1-exl2-h8-8bpw-exl2


[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)

Axolotl Config:
```
base_model: Meta-Llama-3-8B-Instruct
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer
  
train_on_inputs: false
group_by_length: false
load_in_8bit: false
load_in_4bit: true
strict: false
sequence_len: 2048
bf16: true
fp16: false
tf32: false
flash_attention: true

# Data
datasets:
  - path: flan1m-universal-uncensored-system-2048.jsonl
    type:
      system_prompt: ""
      system_format: "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{system}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n"
      field_system: system
      field_instruction: input
      field_output: output
      format: "{instruction}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
      no_input_format: "{instruction}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
    
warmup_steps: 10
dataset_prepared_path: ./last_run_prepared

# Iterations
num_epochs: 1
saves_per_epoch: 4

# Evaluation
val_set_size: 0.01
eval_table_size:
eval_table_max_new_tokens:
eval_sample_packing: false
evals_per_epoch: 4

# LoRA
output_dir: ./qlora-out
adapter: qlora
lora_model_dir:
lora_r: 64
lora_alpha: 128
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_target_modules:
save_safetensors: true

# Sampling
sample_packing: true
pad_to_sequence_len: true

# Batching
gradient_accumulation_steps: 32
micro_batch_size: 4
gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: true

# Optimizer
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 0.0002

# Misc
early_stopping_patience:
resume_from_checkpoint:
logging_steps: 1
debug:
deepspeed: zero3_bf16.json
weight_decay: 0.1
special_tokens:
   pad_token: <|end_of_text|>
```