Fizzarolli commited on
Commit
5b6f34c
1 Parent(s): 5df3bc6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -193
README.md CHANGED
@@ -3,203 +3,23 @@ license: apache-2.0
3
  base_model: h2oai/h2o-danube3-500m-base
4
  tags:
5
  - axolotl
6
- - generated_from_trainer
7
- model-index:
8
- - name: clite7-500m-test-ckpts
9
- results: []
10
- ---
11
-
12
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
- should probably proofread and complete it, then remove this comment. -->
14
-
15
- [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
16
- <details><summary>See axolotl config</summary>
17
-
18
- axolotl version: `0.4.1`
19
- ```yaml
20
- # Weights and Biases logging config
21
- wandb_project: clite
22
- wandb_entity:
23
- wandb_watch:
24
- wandb_name: v7
25
- wandb_log_model:
26
-
27
- # Model architecture config
28
- base_model: h2oai/h2o-danube3-500m-base
29
- model_type: AutoModelForCausalLM
30
- tokenizer_type: AutoTokenizer
31
- chat_template: anthropic
32
-
33
- # Hugging Face saving config
34
- hub_model_id:
35
- hub_strategy:
36
- push_dataset_to_hub:
37
- hf_use_auth_token:
38
-
39
- # Model checkpointing config
40
- output_dir: ./lora-out
41
- resume_from_checkpoint:
42
- save_steps:
43
- saves_per_epoch: 5
44
- save_safetensors: true
45
- save_total_limit: 2
46
-
47
- # Mixed precision training config
48
- bf16: true
49
- fp16: false
50
- tf32: false
51
-
52
- # Model loading config
53
- load_in_8bit: false
54
- load_in_4bit: false
55
- strict: false
56
-
57
- # Sequence config
58
- sequence_len: 8192
59
- s2_attention: false
60
- sample_packing: true
61
- eval_sample_packing: true
62
- pad_to_sequence_len: true
63
- train_on_inputs: true
64
- group_by_length: false
65
-
66
- # Dataset config
67
  datasets:
68
- - path: kalomaze/Opus_Instruct_3k
69
- type: chat_template
70
- val_set_size: 0.1
71
- evaluation_strategy:
72
- eval_steps:
73
- evals_per_epoch: 10
74
- test_datasets:
75
- dataset_prepared_path: ./last-preped-dataset
76
- shuffle_merged_datasets: true
77
-
78
- # Training hyperparameters
79
- num_epochs: 3
80
- gradient_accumulation_steps: 2
81
- micro_batch_size: 8
82
- eval_batch_size: 8
83
- warmup_steps: 10
84
- optimizer: paged_adamw_8bit
85
- lr_scheduler: cosine
86
- learning_rate: 0.00004
87
- cosine_min_lr_ratio: 0.1
88
- weight_decay: 0.1
89
- max_grad_norm: 1
90
- logging_steps: 1
91
-
92
- # Model optimization
93
- gradient_checkpointing: unsloth
94
- xformers_attention: false
95
- flash_attention: true
96
- sdp_attention: false
97
- unsloth_cross_entropy_loss: false
98
- unsloth_lora_mlp: false
99
- unsloth_lora_qkv: false
100
- unsloth_lora_o: false
101
-
102
- # Loss monitoring config
103
- early_stopping_patience: false
104
- loss_watchdog_threshold: 100.0
105
- loss_watchdog_patience: 3
106
-
107
- # Debug config
108
- debug: true
109
- seed: 02496
110
-
111
- # DeepSpeed and FSDP config
112
- deepspeed:
113
- fsdp:
114
- fsdp_config:
115
-
116
- # Token config
117
- special_tokens:
118
- tokens: # these are delimiters
119
- - "<EOT>"
120
 
121
- # Checkpoint backing up
122
- hub_model_id: Fizzarolli/clite7-500m-test-ckpts
123
- hub_strategy: all_checkpoints
124
 
 
125
  ```
 
126
 
127
- </details><br>
128
-
129
- [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/ruthenic/clite/runs/diil6zl9)
130
- # clite7-500m-test-ckpts
131
-
132
- This model is a fine-tuned version of [h2oai/h2o-danube3-500m-base](https://huggingface.co/h2oai/h2o-danube3-500m-base) on the None dataset.
133
- It achieves the following results on the evaluation set:
134
- - Loss: 1.3765
135
-
136
- ## Model description
137
-
138
- More information needed
139
-
140
- ## Intended uses & limitations
141
-
142
- More information needed
143
-
144
- ## Training and evaluation data
145
 
146
- More information needed
147
 
148
- ## Training procedure
149
-
150
- ### Training hyperparameters
151
-
152
- The following hyperparameters were used during training:
153
- - learning_rate: 4e-05
154
- - train_batch_size: 8
155
- - eval_batch_size: 8
156
- - seed: 2496
157
- - gradient_accumulation_steps: 2
158
- - total_train_batch_size: 16
159
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
160
- - lr_scheduler_type: cosine
161
- - lr_scheduler_warmup_steps: 10
162
- - num_epochs: 3
163
-
164
- ### Training results
165
-
166
- | Training Loss | Epoch | Step | Validation Loss |
167
- |:-------------:|:------:|:----:|:---------------:|
168
- | 2.9517 | 0.0952 | 1 | 3.7616 |
169
- | 2.9796 | 0.1905 | 2 | 3.6462 |
170
- | 2.9632 | 0.2857 | 3 | 3.3357 |
171
- | 2.6639 | 0.3810 | 4 | 3.0408 |
172
- | 2.5048 | 0.4762 | 5 | 2.7322 |
173
- | 2.4911 | 0.5714 | 6 | 2.5094 |
174
- | 2.1291 | 0.6667 | 7 | 2.3554 |
175
- | 4.8452 | 0.7619 | 8 | 1.6418 |
176
- | 1.6902 | 0.8571 | 9 | 1.6067 |
177
- | 1.6166 | 0.9524 | 10 | 1.5581 |
178
- | 1.5985 | 1.0476 | 11 | 1.5162 |
179
- | 1.5001 | 1.0476 | 12 | 1.4847 |
180
- | 1.4679 | 1.1429 | 13 | 1.4601 |
181
- | 1.4981 | 1.2381 | 14 | 1.4440 |
182
- | 1.4864 | 1.3333 | 15 | 1.4293 |
183
- | 1.4895 | 1.4286 | 16 | 1.4174 |
184
- | 1.4653 | 1.5238 | 17 | 1.4061 |
185
- | 1.4447 | 1.6190 | 18 | 1.3988 |
186
- | 1.4492 | 1.7143 | 19 | 1.3937 |
187
- | 1.4244 | 1.8095 | 20 | 1.3896 |
188
- | 1.4319 | 1.9048 | 21 | 1.3858 |
189
- | 1.4238 | 2.0 | 22 | 1.3830 |
190
- | 1.4725 | 2.0952 | 23 | 1.3810 |
191
- | 1.3862 | 2.0952 | 24 | 1.3794 |
192
- | 1.3526 | 2.1905 | 25 | 1.3783 |
193
- | 1.4134 | 2.2857 | 26 | 1.3776 |
194
- | 1.3909 | 2.3810 | 27 | 1.3771 |
195
- | 1.4016 | 2.4762 | 28 | 1.3769 |
196
- | 1.3494 | 2.5714 | 29 | 1.3766 |
197
- | 1.3783 | 2.6667 | 30 | 1.3765 |
198
-
199
-
200
- ### Framework versions
201
-
202
- - Transformers 4.42.4
203
- - Pytorch 2.1.2+cu118
204
- - Datasets 2.19.1
205
- - Tokenizers 0.19.1
 
3
  base_model: h2oai/h2o-danube3-500m-base
4
  tags:
5
  - axolotl
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  datasets:
7
+ - kalomaze/Opus_Instruct_3k
8
+ language:
9
+ - en
10
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
+ # Clite
13
+ claude lite. for sure not a euphemism
 
14
 
15
+ ## Prompting
16
  ```
17
+ You are an AI assistant named Claude created by Anthropic to be helpful, harmless, and honest.
18
 
19
+ Human: [Query]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
 
21
+ Assistant: [Response]<EOT>
22
 
23
+ Human: ...
24
+ ```
25
+ HOWEVER. the model is a bit stupid, so you should probably stop on `\n\nHuman:` instead of just `<EOT>`, as it'll be more reliable