lucyknada commited on
Commit
5716b1c
1 Parent(s): bf3e765

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +97 -0
README.md CHANGED
@@ -49,6 +49,103 @@ To create a working GGUF file, make the following adjustments:
49
 
50
  These modifications should allow you to use the model with llama.cpp, albeit with the mentioned context limitation.
51
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
  ## Credits
53
 
54
  - [anthracite-org/Stheno-Data-Filtered](https://huggingface.co/datasets/anthracite-org/Stheno-Data-Filtered)
 
49
 
50
  These modifications should allow you to use the model with llama.cpp, albeit with the mentioned context limitation.
51
 
52
+ ## axolotl config
53
+
54
+ <details><summary>See axolotl config</summary>
55
+
56
+ axolotl version: `0.4.1`
57
+ ```yaml
58
+ base_model: IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml
59
+ model_type: AutoModelForCausalLM
60
+ tokenizer_type: AutoTokenizer
61
+
62
+ load_in_8bit: false
63
+ load_in_4bit: false
64
+ strict: false
65
+
66
+ datasets:
67
+ - path: anthracite-org/Gryphe-3.5-16k-Subset
68
+ type: sharegpt
69
+ conversation: chatml
70
+ - path: Epiculous/Synthstruct-Gens-v1-Filtered-n-Cleaned
71
+ type: sharegpt
72
+ conversation: chatml
73
+ - path: anthracite-org/Stheno-Data-Filtered
74
+ type: sharegpt
75
+ conversation: chatml
76
+ - path: Epiculous/SynthRP-Gens-v1-Filtered-n-Cleaned
77
+ type: sharegpt
78
+ conversation: chatml
79
+ - path: lodrick-the-lafted/NopmWritingStruct
80
+ type: sharegpt
81
+ conversation: chatml
82
+ - path: anthracite-org/kalo-opus-instruct-22k-no-refusal
83
+ type: sharegpt
84
+ conversation: chatml
85
+
86
+ chat_template: chatml
87
+
88
+ val_set_size: 0.01
89
+ output_dir: ./outputs/out
90
+
91
+ adapter:
92
+ lora_r:
93
+ lora_alpha:
94
+ lora_dropout:
95
+ lora_target_linear:
96
+
97
+ sequence_len: 16384
98
+ # sequence_len: 32768
99
+ sample_packing: true
100
+ eval_sample_packing: false
101
+ pad_to_sequence_len: true
102
+
103
+ wandb_project:
104
+ wandb_entity:
105
+ wandb_watch:
106
+ wandb_name:
107
+ wandb_log_model:
108
+
109
+ gradient_accumulation_steps: 32
110
+ micro_batch_size: 1
111
+ num_epochs: 2
112
+ optimizer: adamw_bnb_8bit
113
+ lr_scheduler: cosine
114
+ learning_rate: 0.00002
115
+ weight_decay: 0.05
116
+
117
+ train_on_inputs: false
118
+ group_by_length: false
119
+ bf16: auto
120
+ fp16:
121
+ tf32: true
122
+
123
+ gradient_checkpointing: true
124
+ early_stopping_patience:
125
+ resume_from_checkpoint:
126
+ local_rank:
127
+ logging_steps: 1
128
+ xformers_attention:
129
+ flash_attention: true
130
+
131
+ warmup_ratio: 0.1
132
+ evals_per_epoch: 4
133
+ eval_table_size:
134
+ eval_max_new_tokens: 128
135
+ saves_per_epoch: 1
136
+
137
+ debug:
138
+ deepspeed:
139
+ fsdp:
140
+ fsdp_config:
141
+
142
+ special_tokens:
143
+ pad_token: <|finetune_right_pad_id|>
144
+
145
+ ```
146
+
147
+ </details><br>
148
+
149
  ## Credits
150
 
151
  - [anthracite-org/Stheno-Data-Filtered](https://huggingface.co/datasets/anthracite-org/Stheno-Data-Filtered)