gardner commited on
Commit
c475ecf
·
verified ·
1 Parent(s): 746b337

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,165 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: peft
4
+ tags:
5
+ - generated_from_trainer
6
+ base_model: TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
7
+ model-index:
8
+ - name: TinyLlama-1.1B-SlimOrca-Function-Calling-3T
9
+ results: []
10
+ datasets:
11
+ - Open-Orca/SlimOrca-Dedup
12
+ - gardner/glaive-function-calling-v2-sharegpt
13
+ language: en
14
+ ---
15
+
16
+ # TinyLlama-1.1B-SlimOrca-Function-Calling-3T
17
+
18
+ This model is a fine-tuned version of [TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T) on the [SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca-Dedup) and [glaive-function-calling-v2](https://huggingface.co/datasets/gardner/glaive-function-calling-v2-sharegpt) datasets.
19
+
20
+ # Evaluation
21
+
22
+ It achieves the following results on the evaluation set:
23
+ - Loss: 0.7403
24
+
25
+ Please see the `scripts/llm-eval.py` to recreate the evaluation results from the test split as published here: [gardner/tinyllama-function-calling-eval](https://huggingface.co/datasets/gardner/tinyllama-function-calling-eval). The model responds with function calling when expected and refuses when it doesn't have access to tools. In the linked dataset, `result1` is generated by this model and `result2` is from the test dataset.
26
+
27
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
28
+ <details><summary>See axolotl config</summary>
29
+
30
+ axolotl version: `0.3.0`
31
+ ```yaml
32
+ base_model: TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
33
+ model_type: LlamaForCausalLM
34
+ tokenizer_type: LlamaTokenizer
35
+ is_llama_derived_model: true
36
+
37
+ load_in_8bit: true
38
+ load_in_4bit: false
39
+ strict: false
40
+
41
+ datasets:
42
+ - path: Open-Orca/SlimOrca-Dedup
43
+ type: sharegpt
44
+ conversation: chatml
45
+
46
+ - path: gardner/glaive-function-calling-v2-sharegpt
47
+ type: sharegpt
48
+ conversation: chatml
49
+
50
+ dataset_prepared_path: ./.prepared-datasets/glaive-function-calling-v2-sharegpt
51
+ val_set_size: 0.05
52
+ output_dir: ./tinyllama/function-calling/chatml
53
+
54
+ sequence_len: 4096
55
+ sample_packing: true
56
+ pad_to_sequence_len: true
57
+
58
+ adapter: lora
59
+ lora_model_dir:
60
+ lora_r: 32
61
+ lora_alpha: 16
62
+ lora_dropout: 0.05
63
+ lora_target_linear: true
64
+ lora_fan_in_fan_out:
65
+
66
+ wandb_project:
67
+ wandb_entity:
68
+ wandb_watch:
69
+ wandb_name:
70
+ wandb_log_model:
71
+
72
+ gradient_accumulation_steps: 4
73
+ micro_batch_size: 2
74
+ num_epochs: 4
75
+ optimizer: adamw_bnb_8bit
76
+ lr_scheduler: cosine
77
+ learning_rate: 0.0002
78
+
79
+ train_on_inputs: false
80
+ group_by_length: false
81
+ bf16: true
82
+ fp16: false
83
+ tf32: false
84
+
85
+ gradient_checkpointing: true
86
+ early_stopping_patience:
87
+ resume_from_checkpoint:
88
+ local_rank:
89
+ logging_steps: 1
90
+ xformers_attention:
91
+ flash_attention: true
92
+
93
+ warmup_steps: 10
94
+ evals_per_epoch: 4
95
+ saves_per_epoch: 1
96
+ debug:
97
+ deepspeed:
98
+ weight_decay: 0.0
99
+ fsdp:
100
+ fsdp_config:
101
+ special_tokens:
102
+
103
+ ```
104
+
105
+ </details><br>
106
+
107
+
108
+
109
+ ## Training procedure
110
+
111
+ The following `bitsandbytes` quantization config was used during training:
112
+ - quant_method: bitsandbytes
113
+ - load_in_8bit: True
114
+ - load_in_4bit: False
115
+ - llm_int8_threshold: 6.0
116
+ - llm_int8_skip_modules: None
117
+ - llm_int8_enable_fp32_cpu_offload: False
118
+ - llm_int8_has_fp16_weight: False
119
+ - bnb_4bit_quant_type: fp4
120
+ - bnb_4bit_use_double_quant: False
121
+ - bnb_4bit_compute_dtype: float32
122
+
123
+ ### Training hyperparameters
124
+
125
+ The following hyperparameters were used during training:
126
+ - learning_rate: 0.0002
127
+ - train_batch_size: 2
128
+ - eval_batch_size: 2
129
+ - seed: 42
130
+ - gradient_accumulation_steps: 4
131
+ - total_train_batch_size: 8
132
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
133
+ - lr_scheduler_type: cosine
134
+ - lr_scheduler_warmup_steps: 10
135
+ - num_epochs: 4
136
+
137
+ ### Training results
138
+
139
+ | Training Loss | Epoch | Step | Validation Loss |
140
+ |:-------------:|:-----:|:-----:|:---------------:|
141
+ | 1.2492 | 0.0 | 1 | 1.2363 |
142
+ | 0.7621 | 0.25 | 1896 | 0.8096 |
143
+ | 0.757 | 0.5 | 3792 | 0.7852 |
144
+ | 0.6424 | 0.75 | 5688 | 0.7717 |
145
+ | 0.5944 | 1.04 | 7584 | 0.7625 |
146
+ | 0.73 | 1.29 | 9480 | 0.7585 |
147
+ | 0.6781 | 1.54 | 11376 | 0.7521 |
148
+ | 0.829 | 1.79 | 13272 | 0.7471 |
149
+ | 0.6964 | 2.08 | 15168 | 0.7467 |
150
+ | 0.6652 | 2.33 | 17064 | 0.7453 |
151
+ | 0.7645 | 2.58 | 18960 | 0.7420 |
152
+ | 0.5702 | 2.83 | 20856 | 0.7392 |
153
+ | 0.7049 | 3.12 | 22752 | 0.7418 |
154
+ | 0.6087 | 3.37 | 24648 | 0.7412 |
155
+ | 0.6064 | 3.62 | 26544 | 0.7405 |
156
+ | 0.7125 | 3.87 | 28440 | 0.7403 |
157
+
158
+
159
+ ### Framework versions
160
+
161
+ - PEFT 0.7.0
162
+ - Transformers 4.37.0.dev0
163
+ - Pytorch 2.0.1+cu118
164
+ - Datasets 2.16.1
165
+ - Tokenizers 0.15.0
config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T",
3
+ "architectures": [
4
+ "LlamaForCausalLM"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "bos_token_id": 1,
9
+ "eos_token_id": 2,
10
+ "hidden_act": "silu",
11
+ "hidden_size": 2048,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 5632,
14
+ "max_position_embeddings": 4096,
15
+ "model_type": "llama",
16
+ "num_attention_heads": 32,
17
+ "num_hidden_layers": 22,
18
+ "num_key_value_heads": 4,
19
+ "pretraining_tp": 1,
20
+ "rms_norm_eps": 1e-05,
21
+ "rope_scaling": null,
22
+ "rope_theta": 10000.0,
23
+ "tie_word_embeddings": false,
24
+ "torch_dtype": "bfloat16",
25
+ "transformers_version": "4.37.0.dev0",
26
+ "use_cache": false,
27
+ "vocab_size": 32000
28
+ }
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 1,
3
+ "eos_token_id": 2,
4
+ "max_length": 2048,
5
+ "pad_token_id": 0,
6
+ "transformers_version": "4.37.0.dev0"
7
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c00c53d49251b24c2b789c84e862a73b4679297e88816923bd94f1d0bd62111b
3
+ size 2200164273
scripts/llm-eval.py ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from transformers import AutoModelForCausalLM, AutoTokenizer
2
+ from datasets import load_dataset
3
+ import argilla as rg
4
+ import os
5
+ from tqdm import tqdm
6
+
7
+ dir_path = os.path.dirname(os.path.realpath(__file__))
8
+
9
+
10
+ rg.init(
11
+ api_url="<argilla-api-url>",
12
+ api_key="<argilla-api-key>"
13
+ )
14
+
15
+ ds = rg.FeedbackDataset.for_preference_modeling(
16
+ number_of_responses=2,
17
+ context=False,
18
+ use_markdown=False,
19
+ guidelines=None,
20
+ metadata_properties=None,
21
+ vectors_settings=None,
22
+ )
23
+
24
+ model_dir = dir_path +'/lora-out/merged'
25
+
26
+ tokenizer = AutoTokenizer.from_pretrained(model_dir)
27
+
28
+ dataset = load_dataset("gardner/glaive-function-calling-v2-sharegpt", split="test")
29
+
30
+ def preprocess_function_calling(examples):
31
+ texts = []
32
+ answers = []
33
+ for chat in examples["conversations"]:
34
+ for turn in chat:
35
+ # from -> role
36
+ turn['role'] = turn.pop('from')
37
+ turn['content'] = turn.pop('value')
38
+
39
+ if turn['role'] == 'human':
40
+ turn['role'] = 'user'
41
+ if turn['role'] == 'gpt':
42
+ turn['role'] = 'assistant'
43
+
44
+ if chat[-1]['role'] == 'assistant':
45
+ answers.append(chat[-1]['content'])
46
+ del chat[-1] # remove the last assistant turn
47
+
48
+ texts.append(tokenizer.apply_chat_template(chat, tokenize=False))
49
+
50
+ return { "texts": texts, "answer": answers }
51
+
52
+ texts = dataset.map(preprocess_function_calling, batched=True)
53
+
54
+ model = AutoModelForCausalLM.from_pretrained(model_dir).to("cuda")
55
+
56
+ records = []
57
+
58
+ for text in tqdm(texts):
59
+ prompt = text['texts'] + "<|im_start|>assistant\n"
60
+ inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
61
+
62
+ outputs = model.generate(**inputs, max_new_tokens=512)
63
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True, temperature=0.9)
64
+ prompt_len = len(text['texts'] + "<|im_start|>assistant\n")
65
+
66
+ response1 = response[prompt_len:].replace("<|im_end|>\n", "").strip()
67
+ response2 = text['answer']
68
+ print(response1)
69
+
70
+ records.append(rg.FeedbackRecord(fields={
71
+ "prompt": prompt,
72
+ "response1": response1,
73
+ "response2": response2,
74
+ }))
75
+
76
+ # text['response'] = response
77
+
78
+ ds.add_records(records)
79
+ ds.push_to_argilla(name="function-calling", workspace="argilla")
80
+
special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "</s>",
17
+ "unk_token": {
18
+ "content": "<unk>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
3
+ size 499723
tokenizer_config.json ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "added_tokens_decoder": {
5
+ "0": {
6
+ "content": "<unk>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "1": {
14
+ "content": "<s>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "2": {
22
+ "content": "</s>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ }
29
+ },
30
+ "bos_token": "<s>",
31
+ "clean_up_tokenization_spaces": false,
32
+ "eos_token": "</s>",
33
+ "legacy": false,
34
+ "model_max_length": 1000000000000000019884624838656,
35
+ "pad_token": "</s>",
36
+ "padding_side": "right",
37
+ "sp_model_kwargs": {},
38
+ "spaces_between_special_tokens": false,
39
+ "tokenizer_class": "LlamaTokenizer",
40
+ "trust_remote_code": false,
41
+ "unk_token": "<unk>",
42
+ "use_default_system_prompt": false,
43
+ "use_fast": true,
44
+ "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}"
45
+ }