zorooo commited on
Commit
480ffbf
1 Parent(s): cb5773f

End of training

Browse files
Files changed (2) hide show
  1. README.md +159 -0
  2. adapter_model.bin +3 -0
README.md ADDED
@@ -0,0 +1,159 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: peft
3
+ tags:
4
+ - axolotl
5
+ - generated_from_trainer
6
+ base_model: NousResearch/Llama-2-7b-hf
7
+ model-index:
8
+ - name: MathLlama-7b
9
+ results: []
10
+ ---
11
+
12
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
+ should probably proofread and complete it, then remove this comment. -->
14
+
15
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
16
+ <details><summary>See axolotl config</summary>
17
+
18
+ axolotl version: `0.3.0`
19
+ ```yaml
20
+ base_model: NousResearch/Llama-2-7b-hf
21
+ model_type: LlamaForCausalLM
22
+ tokenizer_type: LlamaTokenizer
23
+ is_llama_derived_model: true
24
+ hub_model_id: MathLlama-7b
25
+
26
+ load_in_8bit: false
27
+ load_in_4bit: true
28
+ strict: false
29
+
30
+ datasets:
31
+ - path: zorooo/Eval_Math_Derivatives
32
+ type: alpaca
33
+ dataset_prepared_path:
34
+ val_set_size: 0.05
35
+ output_dir: ./qlora-out-2
36
+
37
+ adapter: qlora
38
+ lora_model_dir:
39
+
40
+ sequence_len: 2048
41
+ sample_packing: true
42
+ pad_to_sequence_len: true
43
+
44
+ lora_r: 32
45
+ lora_alpha: 16
46
+ lora_dropout: 0.05
47
+ lora_target_modules:
48
+ lora_target_linear: true
49
+ lora_fan_in_fan_out:
50
+
51
+ wandb_project: axolotl_run_1_math_llama
52
+ wandb_entity:
53
+ wandb_watch:
54
+ wandb_name: math_llama_run2
55
+ wandb_log_model:
56
+
57
+ gradient_accumulation_steps: 4
58
+ micro_batch_size: 2
59
+ num_epochs: 5
60
+ optimizer: paged_adamw_32bit
61
+ lr_scheduler: cosine
62
+ learning_rate: 0.0002
63
+
64
+ train_on_inputs: false
65
+ group_by_length: false
66
+ bf16: true
67
+ fp16: false
68
+ tf32: false
69
+
70
+ gradient_checkpointing: true
71
+ early_stopping_patience:
72
+ resume_from_checkpoint:
73
+ local_rank:
74
+ logging_steps: 1
75
+ xformers_attention:
76
+ flash_attention: true
77
+
78
+ warmup_steps: 100
79
+ evals_per_epoch: 4
80
+ eval_table_size:
81
+ saves_per_epoch: 1
82
+ debug:
83
+ deepspeed:
84
+ weight_decay: 0.0
85
+ fsdp:
86
+ fsdp_config:
87
+ special_tokens:
88
+ bos_token: "<s>"
89
+ eos_token: "</s>"
90
+ unk_token: "<unk>"
91
+ ```
92
+
93
+ </details><br>
94
+
95
+ # MathLlama-7b
96
+
97
+ This model is a fine-tuned version of [NousResearch/Llama-2-7b-hf](https://huggingface.co/NousResearch/Llama-2-7b-hf) on the None dataset.
98
+ It achieves the following results on the evaluation set:
99
+ - Loss: 0.1580
100
+
101
+ ## Model description
102
+
103
+ More information needed
104
+
105
+ ## Intended uses & limitations
106
+
107
+ More information needed
108
+
109
+ ## Training and evaluation data
110
+
111
+ More information needed
112
+
113
+ ## Training procedure
114
+
115
+ ### Training hyperparameters
116
+
117
+ The following hyperparameters were used during training:
118
+ - learning_rate: 0.0002
119
+ - train_batch_size: 2
120
+ - eval_batch_size: 2
121
+ - seed: 42
122
+ - gradient_accumulation_steps: 4
123
+ - total_train_batch_size: 8
124
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
125
+ - lr_scheduler_type: cosine
126
+ - lr_scheduler_warmup_steps: 100
127
+ - num_epochs: 5
128
+
129
+ ### Training results
130
+
131
+ | Training Loss | Epoch | Step | Validation Loss |
132
+ |:-------------:|:-----:|:----:|:---------------:|
133
+ | 0.952 | 0.04 | 1 | 0.9490 |
134
+ | 0.9351 | 0.27 | 7 | 0.9474 |
135
+ | 0.9431 | 0.54 | 14 | 0.9181 |
136
+ | 0.8078 | 0.82 | 21 | 0.7671 |
137
+ | 0.5693 | 1.06 | 28 | 0.5249 |
138
+ | 0.309 | 1.33 | 35 | 0.3288 |
139
+ | 0.2752 | 1.6 | 42 | 0.2607 |
140
+ | 0.2406 | 1.87 | 49 | 0.2267 |
141
+ | 0.2241 | 2.12 | 56 | 0.2068 |
142
+ | 0.2212 | 2.39 | 63 | 0.1932 |
143
+ | 0.1991 | 2.66 | 70 | 0.1842 |
144
+ | 0.173 | 2.93 | 77 | 0.1738 |
145
+ | 0.162 | 3.18 | 84 | 0.1711 |
146
+ | 0.1357 | 3.46 | 91 | 0.1681 |
147
+ | 0.15 | 3.73 | 98 | 0.1664 |
148
+ | 0.1553 | 4.0 | 105 | 0.1610 |
149
+ | 0.1263 | 4.25 | 112 | 0.1613 |
150
+ | 0.132 | 4.52 | 119 | 0.1580 |
151
+
152
+
153
+ ### Framework versions
154
+
155
+ - PEFT 0.7.2.dev0
156
+ - Transformers 4.37.0.dev0
157
+ - Pytorch 2.0.1+cu118
158
+ - Datasets 2.16.1
159
+ - Tokenizers 0.15.0
adapter_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:86bec5f50e37d656eb95c891a2db9c1b17734b0a7a88522e1175d69ea92f8372
3
+ size 319977229