Update README.md
Browse files
README.md
CHANGED
@@ -6,15 +6,17 @@ license_link: LICENSE
|
|
6 |
|
7 |
## Model description
|
8 |
|
9 |
-
Yi-34B model fine-tuned on AEZAKMI v1 dataset
|
10 |
-
|
11 |
-
|
|
|
|
|
|
|
12 |
|
13 |
-
I used 4096 ctx Yi-34B-Llama uploaded by chargoddard as a base for this training.
|
14 |
|
15 |
## Prompt Format
|
16 |
|
17 |
-
I recommend using ChatML format, as this was used during fine-tune
|
18 |
Here's a prompt format you should use, you can set a different system message, model seems to respect that fine, so it wasn't overfitted.
|
19 |
|
20 |
```
|
@@ -27,15 +29,52 @@ A chat.<|im_end|>
|
|
27 |
|
28 |
## Intended uses & limitations
|
29 |
|
30 |
-
Use is limited by Yi license
|
31 |
|
32 |
## Known Issues
|
33 |
|
34 |
I recommend to set repetition penalty to something around 1.05 to avoid repetition. So far I had good experience running this model with temperature 1.2.
|
35 |
-
Multi-turn conversations could be a bit better, if you ask it to re-write something with some fixes it will have a tendency to just repeat the previous response verbatim without any improvements - this is especially noticeable with repp 1.0
|
36 |
There is still some gptslop left - some responses will have last paragraph with text "Remember that bla bla bla", I will try to get rid of it in the next version of the dataset.
|
37 |
Stories have ChatGPT like paragraph spacing, I will try to introduce a bit more stories that have long paragraphs in the next dataset version.
|
38 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
39 |
## Upcoming
|
40 |
|
41 |
I will release adapter files and maybe exllama v2 quant shortly.
|
|
|
6 |
|
7 |
## Model description
|
8 |
|
9 |
+
Yi-34B base model fine-tuned on AEZAKMI v1 dataset. Training took around 33 hours on single local RTX 3090 Ti.
|
10 |
+
It's like airoboros but with less gptslop, no refusals and less typical language used by RLHFed OpenAI models.
|
11 |
+
Say goodbye to "It's important to remember"! \
|
12 |
+
Prompt format is standard chatml. Don't expect it to be good at math, riddles or be crazy smart. My end goal with AEZAKMI is to create a cozy free chatbot.
|
13 |
+
Cost of this fine-tune is about $3 in electricity. This was my first attempt at training Yi-34B with this dataset.
|
14 |
+
Base model used for fine-tuning was 4k context Yi-34B-Llama model shared by chargoddard.
|
15 |
|
|
|
16 |
|
17 |
## Prompt Format
|
18 |
|
19 |
+
I recommend using ChatML format, as this was used during fine-tune. \
|
20 |
Here's a prompt format you should use, you can set a different system message, model seems to respect that fine, so it wasn't overfitted.
|
21 |
|
22 |
```
|
|
|
29 |
|
30 |
## Intended uses & limitations
|
31 |
|
32 |
+
Use is limited by Yi license.
|
33 |
|
34 |
## Known Issues
|
35 |
|
36 |
I recommend to set repetition penalty to something around 1.05 to avoid repetition. So far I had good experience running this model with temperature 1.2.
|
37 |
+
Multi-turn conversations could be a bit better, if you ask it to re-write something with some fixes it will have a tendency to just repeat the previous response verbatim without any improvements - this is especially noticeable with repp 1.0.
|
38 |
There is still some gptslop left - some responses will have last paragraph with text "Remember that bla bla bla", I will try to get rid of it in the next version of the dataset.
|
39 |
Stories have ChatGPT like paragraph spacing, I will try to introduce a bit more stories that have long paragraphs in the next dataset version.
|
40 |
|
41 |
+
## Axolotl training parameters
|
42 |
+
|
43 |
+
- bnb_4bit_use_double_quant: true
|
44 |
+
- bnb_4bit_compute_dtype: torch.bfloat16
|
45 |
+
- is_llama_derived_model: true
|
46 |
+
- load_in_4bit: true
|
47 |
+
- adapter: qlora
|
48 |
+
- sequence_len: 1200
|
49 |
+
- sample_packing: false
|
50 |
+
- lora_r: 16
|
51 |
+
- lora_alpha: 32
|
52 |
+
- lora_target_modules:
|
53 |
+
- q_proj
|
54 |
+
- v_proj
|
55 |
+
- k_proj
|
56 |
+
- o_proj
|
57 |
+
- gate_proj
|
58 |
+
- down_proj
|
59 |
+
- up_proj
|
60 |
+
- lora_target_linear: true
|
61 |
+
- pad_to_sequence_len: true
|
62 |
+
- micro_batch_size: 1
|
63 |
+
- gradient_accumulation_steps: 1
|
64 |
+
- num_epochs: 1
|
65 |
+
- optimizer: adamw_bnb_8bit
|
66 |
+
- lr_scheduler: constant
|
67 |
+
- learning_rate: 0.00007
|
68 |
+
- train_on_inputs: false
|
69 |
+
- group_by_length: false
|
70 |
+
- bf16: true
|
71 |
+
- bfloat16: true
|
72 |
+
- flash_optimum: false
|
73 |
+
- gradient_checkpointing: true
|
74 |
+
- flash_attention: true
|
75 |
+
- seed: 42
|
76 |
+
|
77 |
+
|
78 |
## Upcoming
|
79 |
|
80 |
I will release adapter files and maybe exllama v2 quant shortly.
|