Weyaxi commited on
Commit
bf6711f
1 Parent(s): 62c7db0

checkpoints readme

Browse files
Files changed (1) hide show
  1. README.md +158 -213
README.md CHANGED
@@ -1,224 +1,169 @@
1
  ---
 
 
2
  license: other
3
- base_model: meta-llama/Meta-Llama-3-8B
4
  tags:
5
  - axolotl
6
  - generated_from_trainer
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  model-index:
8
  - name: Einstein-v6.1-Llama3-8B
9
- results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ---
11
 
12
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
- should probably proofread and complete it, then remove this comment. -->
14
-
15
- [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
16
- <details><summary>See axolotl config</summary>
17
-
18
- axolotl version: `0.4.0`
19
- ```yaml
20
- base_model: meta-llama/Meta-Llama-3-8B
21
- model_type: LlamaForCausalLM
22
- tokenizer_type: AutoTokenizer
23
-
24
- load_in_8bit: false
25
- load_in_4bit: false
26
- strict: false
27
-
28
- chat_template: chatml
29
- datasets:
30
- - path: data/merged_all.json
31
- ds_type: json
32
- type: alpaca
33
- conversation: chatml
34
-
35
- - path: data/gpteacher-instruct-special-alpaca.json
36
- ds_type: json
37
- type: gpteacher
38
- conversation: chatml
39
-
40
- - path: data/wizardlm_evol_instruct_70k_random_half.json
41
- ds_type: json
42
- type: alpaca
43
- conversation: chatml
44
-
45
- - path: data/capybara_sharegpt.json
46
- ds_type: json
47
- type: sharegpt
48
- conversation: chatml
49
-
50
- - path: data/synthia-v1.3_sharegpt_12500.json
51
- ds_type: json
52
- type: sharegpt
53
- conversation: chatml
54
-
55
- - path: data/cot_alpaca_gpt4_extracted_openhermes_2.5_sharegpt.json
56
- ds_type: json
57
- type: sharegpt
58
- conversation: chatml
59
-
60
- - path: data/slimorca_dedup_filtered_95k_sharegpt.json
61
- ds_type: json
62
- type: sharegpt
63
- conversation: chatml
64
-
65
- - path: data/airoboros_3.2_without_contextual_slimorca_orca_sharegpt.json
66
- ds_type: json
67
- type: sharegpt
68
- conversation: chatml
69
-
70
- - path: data/allenai_wild_chat_gpt4_english_toxic_random_half_4k_sharegpt.json
71
- ds_type: json
72
- type: sharegpt
73
- strict: false
74
- conversation: chatml
75
-
76
- - path: data/pippa_bagel_repo_3k_sharegpt.json
77
- ds_type: json
78
- type: sharegpt
79
- conversation: chatml
80
-
81
- - path: data/gpt4_data_lmys_1m_sharegpt.json
82
- ds_type: json
83
- type: sharegpt
84
- conversation: chatml
85
-
86
- - path: data/sharegpt_gpt4_english.json
87
- ds_type: json
88
- type: sharegpt
89
- conversation: chatml
90
-
91
- - path: data/no_robots_sharegpt.json
92
- ds_type: json
93
- type: sharegpt
94
- strict: false
95
- conversation: chatml
96
-
97
- - path: data/oasst_top1_from_fusechatmixture_sharegpt.json
98
- ds_type: json
99
- type: sharegpt
100
- strict: false
101
- conversation: chatml
102
-
103
- - path: data/everythinglm-data-v3_sharegpt.json
104
- ds_type: json
105
- type: sharegpt
106
- strict: false
107
- conversation: chatml
108
-
109
- dataset_prepared_path: last_run_prepared
110
- val_set_size: 0.002
111
-
112
- output_dir: ./Einstein-v6.1-Llama3-8B-model
113
-
114
- sequence_len: 8192
115
- sample_packing: true
116
- pad_to_sequence_len: true
117
- eval_sample_packing: false
118
-
119
- wandb_project: Einstein
120
- wandb_entity:
121
- wandb_watch:
122
- wandb_name: Einstein-v6.1-Llama3-2-epoch
123
- wandb_log_model:
124
- hub_model_id: Weyaxi/Einstein-v6.1-Llama3-8B
125
-
126
- save_safetensors: true
127
-
128
- gradient_accumulation_steps: 4
129
- micro_batch_size: 1
130
- num_epochs: 2
131
- optimizer: adamw_bnb_8bit # look
132
- lr_scheduler: cosine
133
- learning_rate: 0.000005 # look
134
-
135
- train_on_inputs: false
136
- group_by_length: false
137
- bf16: true
138
- fp16: false
139
- tf32: false
140
-
141
- gradient_checkpointing: true
142
- early_stopping_patience:
143
- resume_from_checkpoint:
144
- local_rank:
145
- logging_steps: 1
146
- xformers_attention:
147
- flash_attention: true
148
-
149
- warmup_steps: 10
150
- evals_per_epoch: 2
151
- eval_table_size:
152
- eval_table_max_new_tokens: 128
153
- saves_per_epoch: 2
154
- debug:
155
-
156
- deepspeed: zero3_bf16_cpuoffload_params.json
157
- weight_decay: 0.0
158
- fsdp:
159
- fsdp_config:
160
- special_tokens:
161
- bos_token: "<s>"
162
- eos_token: "<|im_end|>"
163
- unk_token: "<unk>"
164
- pad_token: <|end_of_text|> # changed
165
- tokens:
166
- - "<|im_start|>"
167
-
168
- ```
169
-
170
- </details><br>
171
-
172
- # Einstein-v6.1-Llama3-8B
173
-
174
- This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on the None dataset.
175
- It achieves the following results on the evaluation set:
176
- - Loss: 0.5786
177
-
178
- ## Model description
179
-
180
- More information needed
181
-
182
- ## Intended uses & limitations
183
-
184
- More information needed
185
-
186
- ## Training and evaluation data
187
-
188
- More information needed
189
-
190
- ## Training procedure
191
-
192
- ### Training hyperparameters
193
-
194
- The following hyperparameters were used during training:
195
- - learning_rate: 5e-06
196
- - train_batch_size: 1
197
- - eval_batch_size: 1
198
- - seed: 42
199
- - distributed_type: multi-GPU
200
- - num_devices: 9
201
- - gradient_accumulation_steps: 4
202
- - total_train_batch_size: 36
203
- - total_eval_batch_size: 9
204
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
205
- - lr_scheduler_type: cosine
206
- - lr_scheduler_warmup_steps: 10
207
- - num_epochs: 2
208
-
209
- ### Training results
210
-
211
- | Training Loss | Epoch | Step | Validation Loss |
212
- |:-------------:|:-----:|:----:|:---------------:|
213
- | 1.6849 | 0.0 | 1 | 1.7294 |
214
- | 0.6045 | 0.5 | 507 | 0.6127 |
215
- | 0.5986 | 1.0 | 1014 | 0.5868 |
216
- | 0.5136 | 1.48 | 1521 | 0.5786 |
217
-
218
 
219
- ### Framework versions
220
 
221
- - Transformers 4.40.0.dev0
222
- - Pytorch 2.1.2+cu118
223
- - Datasets 2.18.0
224
- - Tokenizers 0.15.0
 
1
  ---
2
+ language:
3
+ - en
4
  license: other
 
5
  tags:
6
  - axolotl
7
  - generated_from_trainer
8
+ - instruct
9
+ - finetune
10
+ - chatml
11
+ - gpt4
12
+ - synthetic data
13
+ - science
14
+ - physics
15
+ - chemistry
16
+ - biology
17
+ - math
18
+ - llama
19
+ - llama3
20
+ base_model: meta-llama/Meta-Llama-3-8B
21
+ datasets:
22
+ - allenai/ai2_arc
23
+ - camel-ai/physics
24
+ - camel-ai/chemistry
25
+ - camel-ai/biology
26
+ - camel-ai/math
27
+ - metaeval/reclor
28
+ - openbookqa
29
+ - mandyyyyii/scibench
30
+ - derek-thomas/ScienceQA
31
+ - TIGER-Lab/ScienceEval
32
+ - jondurbin/airoboros-3.2
33
+ - LDJnr/Capybara
34
+ - Cot-Alpaca-GPT4-From-OpenHermes-2.5
35
+ - STEM-AI-mtl/Electrical-engineering
36
+ - knowrohit07/saraswati-stem
37
+ - sablo/oasst2_curated
38
+ - lmsys/lmsys-chat-1m
39
+ - TIGER-Lab/MathInstruct
40
+ - bigbio/med_qa
41
+ - meta-math/MetaMathQA-40K
42
+ - openbookqa
43
+ - piqa
44
+ - metaeval/reclor
45
+ - derek-thomas/ScienceQA
46
+ - scibench
47
+ - sciq
48
+ - Open-Orca/SlimOrca
49
+ - migtissera/Synthia-v1.3
50
+ - TIGER-Lab/ScienceEval
51
+ - allenai/WildChat
52
+ - microsoft/orca-math-word-problems-200k
53
+ - openchat/openchat_sharegpt4_dataset
54
+ - teknium/GPTeacher-General-Instruct
55
+ - m-a-p/CodeFeedback-Filtered-Instruction
56
+ - totally-not-an-llm/EverythingLM-data-V3
57
+ - HuggingFaceH4/no_robots
58
+ - OpenAssistant/oasst_top1_2023-08-25
59
+ - WizardLM/WizardLM_evol_instruct_70k
60
  model-index:
61
  - name: Einstein-v6.1-Llama3-8B
62
+ results:
63
+ - task:
64
+ type: text-generation
65
+ name: Text Generation
66
+ dataset:
67
+ name: AI2 Reasoning Challenge (25-Shot)
68
+ type: ai2_arc
69
+ config: ARC-Challenge
70
+ split: test
71
+ args:
72
+ num_few_shot: 25
73
+ metrics:
74
+ - type: acc_norm
75
+ value: 62.46
76
+ name: normalized accuracy
77
+ source:
78
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v6.1-Llama3-8B
79
+ name: Open LLM Leaderboard
80
+ - task:
81
+ type: text-generation
82
+ name: Text Generation
83
+ dataset:
84
+ name: HellaSwag (10-Shot)
85
+ type: hellaswag
86
+ split: validation
87
+ args:
88
+ num_few_shot: 10
89
+ metrics:
90
+ - type: acc_norm
91
+ value: 82.41
92
+ name: normalized accuracy
93
+ source:
94
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v6.1-Llama3-8B
95
+ name: Open LLM Leaderboard
96
+ - task:
97
+ type: text-generation
98
+ name: Text Generation
99
+ dataset:
100
+ name: MMLU (5-Shot)
101
+ type: cais/mmlu
102
+ config: all
103
+ split: test
104
+ args:
105
+ num_few_shot: 5
106
+ metrics:
107
+ - type: acc
108
+ value: 66.19
109
+ name: accuracy
110
+ source:
111
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v6.1-Llama3-8B
112
+ name: Open LLM Leaderboard
113
+ - task:
114
+ type: text-generation
115
+ name: Text Generation
116
+ dataset:
117
+ name: TruthfulQA (0-shot)
118
+ type: truthful_qa
119
+ config: multiple_choice
120
+ split: validation
121
+ args:
122
+ num_few_shot: 0
123
+ metrics:
124
+ - type: mc2
125
+ value: 55.1
126
+ source:
127
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v6.1-Llama3-8B
128
+ name: Open LLM Leaderboard
129
+ - task:
130
+ type: text-generation
131
+ name: Text Generation
132
+ dataset:
133
+ name: Winogrande (5-shot)
134
+ type: winogrande
135
+ config: winogrande_xl
136
+ split: validation
137
+ args:
138
+ num_few_shot: 5
139
+ metrics:
140
+ - type: acc
141
+ value: 79.32
142
+ name: accuracy
143
+ source:
144
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v6.1-Llama3-8B
145
+ name: Open LLM Leaderboard
146
+ - task:
147
+ type: text-generation
148
+ name: Text Generation
149
+ dataset:
150
+ name: GSM8k (5-shot)
151
+ type: gsm8k
152
+ config: main
153
+ split: test
154
+ args:
155
+ num_few_shot: 5
156
+ metrics:
157
+ - type: acc
158
+ value: 66.11
159
+ name: accuracy
160
+ source:
161
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v6.1-Llama3-8B
162
+ name: Open LLM Leaderboard
163
  ---
164
 
165
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6468ce47e134d050a58aa89c/Z-THA-NDPl3YPUnACsQ0c.png)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
166
 
167
+ Checkpoints of [Weyaxi/Einstein-v6-7B](https://huggingface.co/Weyaxi/Einstein-v6.1-Llama3-8B). Head to the main model for more information :)
168
 
169
+ https://huggingface.co/Weyaxi/Einstein-v6.1-Llama3-8B