chat
lucyknada commited on
Commit
07958a3
1 Parent(s): d0c518b

Upload ./README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +161 -0
README.md ADDED
@@ -0,0 +1,161 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: llama3
4
+ base_model: arcee-ai/Llama-3.1-SuperNova-Lite
5
+ tags:
6
+ - generated_from_trainer
7
+ model-index:
8
+ - name: outputs
9
+ results: []
10
+ ---
11
+ ### exl2 quant (measurement.json in main branch)
12
+ ---
13
+ ### check revisions for quants
14
+ ---
15
+
16
+
17
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
18
+ should probably proofread and complete it, then remove this comment. -->
19
+
20
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
21
+ <details><summary>See axolotl config</summary>
22
+
23
+ axolotl version: `0.4.1`
24
+ ```yaml
25
+ base_model: arcee-ai/Llama-3.1-SuperNova-Lite
26
+ model_type: AutoModelForCausalLM
27
+ tokenizer_type: AutoTokenizer
28
+
29
+ load_in_8bit: false
30
+ load_in_4bit: false
31
+ strict: false
32
+
33
+ datasets:
34
+ - path: NewEden/CharacterAI-logs-sharegpt-Ngram-Cleaned
35
+ type: sharegpt
36
+ conversation: llama3
37
+ - path: NewEden/OpenCAI-ShareGPT
38
+ type: sharegpt
39
+ conversation: llama3
40
+
41
+
42
+ chat_template: llama3
43
+
44
+ #val_set_size: 0.01
45
+ output_dir: ./outputs
46
+
47
+ adapter:
48
+ lora_r:
49
+ lora_alpha:
50
+ lora_dropout:
51
+ lora_target_linear:
52
+
53
+ sequence_len: 16384
54
+ # sequence_len: 32768
55
+ sample_packing: true
56
+ eval_sample_packing: false
57
+ pad_to_sequence_len: true
58
+
59
+
60
+ wandb_project: CAI-Supernova
61
+ wandb_entity:
62
+ wandb_watch:
63
+ wandb_name: CAI-Supernova-2
64
+ wandb_log_model:
65
+
66
+
67
+ plugins:
68
+ - axolotl.integrations.liger.LigerPlugin
69
+ liger_rope: true
70
+ liger_rms_norm: true
71
+ liger_swiglu: true
72
+ liger_fused_linear_cross_entropy: true
73
+
74
+ gradient_accumulation_steps: 2
75
+ micro_batch_size: 1
76
+ num_epochs: 4
77
+ optimizer: paged_adamw_8bit
78
+ lr_scheduler: cosine
79
+ learning_rate: 1e-5
80
+ weight_decay: 0.05
81
+
82
+ train_on_inputs: false
83
+ group_by_length: false
84
+ bf16: auto
85
+ fp16:
86
+ tf32: true
87
+
88
+ gradient_checkpointing: unsloth
89
+ early_stopping_patience:
90
+ resume_from_checkpoint:
91
+ #auto_resume_from_checkpoints: true
92
+ local_rank:
93
+ logging_steps: 1
94
+ xformers_attention:
95
+ flash_attention: true
96
+
97
+ warmup_steps: 15
98
+ #evals_per_epoch: 4
99
+ eval_table_size:
100
+ #eval_max_new_tokens: 128
101
+ saves_per_epoch: 1
102
+
103
+ debug:
104
+ deepspeed: /workspace/axolotl/deepspeed_configs/zero3_bf16_cpuoffload_params.json
105
+ fsdp:
106
+ fsdp_config:
107
+
108
+ special_tokens:
109
+ pad_token: <|finetune_right_pad_id|>
110
+ eos_token: <|eot_id|>
111
+
112
+
113
+ ```
114
+
115
+ </details><br>
116
+
117
+ # outputs
118
+
119
+ This model is a fine-tuned version of [arcee-ai/Llama-3.1-SuperNova-Lite](https://huggingface.co/arcee-ai/Llama-3.1-SuperNova-Lite) on the None dataset.
120
+
121
+ ## Model description
122
+
123
+ More information needed
124
+
125
+ ## Intended uses & limitations
126
+
127
+ More information needed
128
+
129
+ ## Training and evaluation data
130
+
131
+ More information needed
132
+
133
+ ## Training procedure
134
+
135
+ ### Training hyperparameters
136
+
137
+ The following hyperparameters were used during training:
138
+ - learning_rate: 1e-05
139
+ - train_batch_size: 1
140
+ - eval_batch_size: 1
141
+ - seed: 42
142
+ - distributed_type: multi-GPU
143
+ - num_devices: 4
144
+ - gradient_accumulation_steps: 2
145
+ - total_train_batch_size: 8
146
+ - total_eval_batch_size: 4
147
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
148
+ - lr_scheduler_type: cosine
149
+ - lr_scheduler_warmup_steps: 15
150
+ - num_epochs: 4
151
+
152
+ ### Training results
153
+
154
+
155
+
156
+ ### Framework versions
157
+
158
+ - Transformers 4.44.2
159
+ - Pytorch 2.3.1+cu121
160
+ - Datasets 2.20.0
161
+ - Tokenizers 0.19.1