ChallengerSpaceShuttle commited on
Commit
6bf90d2
·
verified ·
1 Parent(s): d679f73

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +102 -1
README.md CHANGED
@@ -7,4 +7,105 @@ pipeline_tag: text-generation
7
  ---
8
 
9
 
10
- Finetuned Continued Trained Gemma-2-2b with Qlora
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  ---
8
 
9
 
10
+
11
+ # BafoGPT-3B-it
12
+
13
+ This is BafoGPT-3B-it model finetuned with QLORA using 10% of [ChallengerSpaceShuttle/zulu-finetuning-dataset](https://huggingface.co/datasets/ChallengerSpaceShuttle/zulu-finetuning-dataset) dataset.
14
+
15
+ This is the first iteration, on building IsiZulu models that can attain performance comparable to models that typically require millions of dollars to train from scratch.
16
+
17
+ ## 🔍 Applications
18
+
19
+ This is the supervised finetuned model and has a context length of 8k. The model can generate coherent IsiZulu text with simple instruction, however still faces hallucination for complex instruction. Working on improving our first verison
20
+
21
+ ## ⚡ Quantized models
22
+
23
+ ## 🏆 Evaluation
24
+
25
+ ## 🧩 Configuration
26
+
27
+ The code used to train the model can be found here: [BafoGPT](https://github.com/Motsepe-Jr/bafoGPT/tree/main) with the following training configuration.
28
+
29
+ ```yaml
30
+ checkpoint_dir: checkpoints/google/gemma-2-2b
31
+ out_dir: out/finetune/lora
32
+ precision: bf16-true
33
+ quantize: bnb.nf4-dq
34
+ devices: 1
35
+ num_nodes: 1
36
+ lora_r: 8
37
+ lora_alpha: 16
38
+ lora_dropout: 0.05
39
+ lora_query: true
40
+ lora_key: true
41
+ lora_value: true
42
+ lora_projection: true
43
+ lora_mlp: false
44
+ lora_head: false
45
+ data:
46
+ class_path: litgpt.data.JSON
47
+ init_args:
48
+ json_path: data/train.json
49
+ mask_prompt: false
50
+ val_split_fraction: 0.05
51
+ prompt_style: alpaca
52
+ ignore_index: -100
53
+ seed: 42
54
+ num_workers: 4
55
+ train:
56
+ save_interval: 1000
57
+ log_interval: 1
58
+ global_batch_size: 4
59
+ micro_batch_size: 1
60
+ lr_warmup_steps: 1000
61
+ epochs: 1
62
+ max_seq_length: 1024
63
+ min_lr: 6.0e-05
64
+ eval:
65
+ interval: 1000
66
+ max_new_tokens: 100
67
+ max_iters: 100
68
+ initial_validation: false
69
+ final_validation: true
70
+ optimizer: AdamW
71
+ logger_name: csv
72
+ seed: 1337
73
+ ```
74
+
75
+ Architecture Config
76
+
77
+ ```json
78
+ {
79
+ "architectures": [
80
+ "Gemma2ForCausalLM"
81
+ ],
82
+ "attention_bias": false,
83
+ "attention_dropout": 0.0,
84
+ "attn_logit_softcapping": 50.0,
85
+ "bos_token_id": 2,
86
+ "cache_implementation": "hybrid",
87
+ "eos_token_id": 1,
88
+ "final_logit_softcapping": 30.0,
89
+ "head_dim": 256,
90
+ "hidden_act": "gelu_pytorch_tanh",
91
+ "hidden_activation": "gelu_pytorch_tanh",
92
+ "hidden_size": 2304,
93
+ "initializer_range": 0.02,
94
+ "intermediate_size": 9216,
95
+ "max_position_embeddings": 8192,
96
+ "model_type": "gemma2",
97
+ "num_attention_heads": 8,
98
+ "num_hidden_layers": 26,
99
+ "num_key_value_heads": 4,
100
+ "pad_token_id": 0,
101
+ "query_pre_attn_scalar": 256,
102
+ "rms_norm_eps": 1e-06,
103
+ "rope_theta": 10000.0,
104
+ "sliding_window": 4096,
105
+ "torch_dtype": "float32",
106
+ "transformers_version": "4.42.4",
107
+ "use_cache": true,
108
+ "vocab_size": 288256
109
+ }
110
+ ```
111
+