nymtheescobar commited on
Commit
37e45b9
·
verified ·
1 Parent(s): 27574be

delete readme

Browse files
Files changed (1) hide show
  1. README.md +0 -164
README.md CHANGED
@@ -1,164 +0,0 @@
1
- ---
2
- base_model: meta-llama/Llama-3.2-3B
3
- library_name: peft
4
- license: llama3.2
5
- tags:
6
- - axolotl
7
- - generated_from_trainer
8
- model-index:
9
- - name: BanglaLLama-3.2-11b-3b-unolp-culturax-base-v0.0.2
10
- results: []
11
- ---
12
-
13
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
- should probably proofread and complete it, then remove this comment. -->
15
-
16
- [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
17
- <details><summary>See axolotl config</summary>
18
-
19
- axolotl version: `0.4.1`
20
- ```yaml
21
- base_model: meta-llama/Llama-3.2-3B
22
- model_type: LlamaForCausalLM
23
- #processor_type: AutoProcessor
24
- tokenizer_type: AutoTokenizer
25
-
26
- #skip_prepare_dataset: true
27
- #remove_unused_columns: false
28
- #sample_packing: false
29
-
30
- load_in_8bit: false
31
- load_in_4bit: false
32
- strict: false
33
-
34
- # datasets:
35
- # #- path: mhenrichsen/alpaca_2k_test
36
- # - path: yahma/alpaca-cleaned
37
- # type: alpaca
38
- max_steps: 10000
39
- #chat_template: llama3_2_vision
40
- pretraining_dataset:
41
- - path: "uonlp/CulturaX"
42
- name: bn
43
- type: pretrain
44
- dataset_prepared_path: /workspace/datasets/last_run_prepared_pretrain_llama3.2-3b_uonlp_CulturaX_bn_9Oct2024_1_30_PM
45
- val_set_size: 0.00
46
- output_dir: /workspace/outputs/llama-3.2-11b-pretrain-uonlp-culturax-6Oct2024_10_20_AM
47
-
48
- # push to hf hub
49
- hub_model_id: BanglaLLM/BanglaLLama-3.2-11b-3b-unolp-culturax-base-v0.0.2
50
- hub_strategy: end
51
-
52
- #sequence_len: 4096
53
- sequence_len: 8192
54
- pad_to_sequence_len: false
55
-
56
- # all below changed above
57
- sample_packing: true
58
- eval_sample_packing: false
59
- pad_to_sequence_len: true
60
- remove_unused_columns: true
61
-
62
- adapter: lora
63
- lora_model_dir:
64
- lora_r: 32
65
- lora_alpha: 16
66
- lora_dropout: 0.05
67
- #lora_target_modules: 'language_model.model.layers.[\d]+.(mlp|cross_attn|self_attn).(up|down|gate|q|k|v|o)_proj'
68
- lora_target_linear: true
69
- lora_fan_in_fan_out:
70
- lora_modules_to_save:
71
- - embed_tokens
72
- - lm_head
73
-
74
- wandb_project: banglallm-training
75
- wandb_entity: banglallm
76
- wandb_watch:
77
- wandb_name: balglallm-training-llama3.2-3b-pretraining-9Oct2024_1_30_PM
78
- wandb_run_id: balglallm-training-llama3.2-3b-pretraining-9Oct2024_1_30_PM-id-1
79
- wandb_log_model: checkpoint
80
-
81
- gradient_accumulation_steps: 4 #8
82
- micro_batch_size: 1
83
- num_epochs: 1
84
- optimizer: adamw_bnb_8bit
85
- lr_scheduler: cosine
86
- learning_rate: 0.0002
87
-
88
- train_on_inputs: false
89
- group_by_length: false
90
- bf16: auto
91
- fp16:
92
- tf32: false
93
-
94
- gradient_checkpointing: true
95
- early_stopping_patience:
96
- resume_from_checkpoint:
97
- local_rank:
98
- logging_steps: 1
99
- xformers_attention:
100
- flash_attention: true
101
- s2_attention:
102
-
103
- #warmup_steps: 10
104
- warmup_ratio: 0.1
105
- evals_per_epoch: 4
106
- eval_table_size:
107
- #eval_max_new_tokens: 128
108
- saves_per_epoch: 1
109
- debug:
110
- deepspeed:
111
- weight_decay: 0.0
112
- fsdp:
113
- fsdp_config:
114
- special_tokens:
115
- pad_token: <|end_of_text|>
116
-
117
- ```
118
-
119
- </details><br>
120
-
121
- [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/banglallm/banglallm-training/runs/balglallm-training-llama3.2-3b-pretraining-9Oct2024_1_30_PM-id-1)
122
- # BanglaLLama-3.2-11b-3b-unolp-culturax-base-v0.0.2
123
-
124
- This model is a fine-tuned version of [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) on an unknown dataset.
125
-
126
- ## Model description
127
-
128
- More information needed
129
-
130
- ## Intended uses & limitations
131
-
132
- More information needed
133
-
134
- ## Training and evaluation data
135
-
136
- More information needed
137
-
138
- ## Training procedure
139
-
140
- ### Training hyperparameters
141
-
142
- The following hyperparameters were used during training:
143
- - learning_rate: 0.0002
144
- - train_batch_size: 1
145
- - eval_batch_size: 1
146
- - seed: 42
147
- - gradient_accumulation_steps: 4
148
- - total_train_batch_size: 4
149
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
150
- - lr_scheduler_type: cosine
151
- - lr_scheduler_warmup_steps: 1000
152
- - training_steps: 10000
153
-
154
- ### Training results
155
-
156
-
157
-
158
- ### Framework versions
159
-
160
- - PEFT 0.13.0
161
- - Transformers 4.45.1
162
- - Pytorch 2.3.1+cu121
163
- - Datasets 2.21.0
164
- - Tokenizers 0.20.0