chainyo commited on
Commit
62c31bf
1 Parent(s): 8bc1e8a

add adapters and explanations

Browse files
README.md CHANGED
@@ -7,9 +7,169 @@ library_name: transformers
7
  pipeline_tag: text-generation
8
  tags:
9
  - peft
10
- - chat
11
  - llama
12
  ---
13
  # LlaMA Natural Instructions
14
 
15
- wip
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  pipeline_tag: text-generation
8
  tags:
9
  - peft
 
10
  - llama
11
  ---
12
  # LlaMA Natural Instructions
13
 
14
+ ![LlaMA Natural Instructions](./llama-natural-instructions-removebg-preview.png)
15
+
16
+ This model is a fine-tuned version of [llama-7b](https://huggingface.co/decapoda-research/llama-7b-hf) from [Meta](https://huggingface.co/facebook)
17
+ on the [Natural Instructions](https://huggingface.co/datasets/Muennighoff/natural-instructions) dataset from [AllenAI](https://huggingface.co/allenai).
18
+
19
+ ⚠️ **This model is for Research purpose only (See the [license](https://huggingface.co/decapoda-research/llama-7b-hf/blob/main/LICENSE))**
20
+
21
+ ## WandB Report
22
+
23
+ Click on the badge below to see the full report on Weights & Biases.
24
+
25
+ ![https://img.shields.io/badge/Weights_&_Biases-FFCC33?style=for-the-badge&logo=WeightsAndBiases&logoColor=black](https://api.wandb.ai/links/chainyo-mleng/ia2mloow)
26
+
27
+ ## Usage
28
+
29
+ ### Installation
30
+
31
+ ```bash
32
+ pip install loralib bitsandbytes datasets git+https://github.com/huggingface/peft.git git+https://github.com/huggingface/transformers.git sentencepiece
33
+ ```
34
+
35
+ ### Format of the input
36
+
37
+ The input should be a string of text with the following format:
38
+
39
+ ```python
40
+ prompt_template = {
41
+ "prompt": "### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:\n",
42
+ "response": "### Response:"
43
+ }
44
+
45
+ def generate_prompt(
46
+ definition: str,
47
+ inputs: str,
48
+ targets: Union[None, str] = None,
49
+ ) -> str:
50
+ """Generate a prompt from instruction and input."""
51
+ res = prompt_template["prompt"].format(
52
+ instruction=definition, input=inputs
53
+ )
54
+
55
+ if targets:
56
+ res = f"{res}{targets}"
57
+
58
+ return res
59
+
60
+ def get_response(output: str) -> str:
61
+ """Get the response from the output."""
62
+ return output.split(prompt_template["response"])[1].strip()
63
+ ```
64
+
65
+ Feel free to use these utility functions to generate the prompt and to extract the response from the model output.
66
+
67
+ - `definition` is the instruction describing the task. It's generally a single sentence explaining the expected output and
68
+ the reasoning steps to follow.
69
+ - `inputs` is the input to the task. It can be a single sentence or a paragraph. It's the context used by the model to
70
+ generate the response to the task.
71
+ - `targets` is the expected output of the task. It's used for training the model. _It's not required for inference._
72
+
73
+ ### Inference
74
+
75
+ You can load the model using only the adapters or load the full model with the adapters and the weights.
76
+
77
+ #### The tokenizer
78
+
79
+ ```python
80
+ from transformers import LlamaTokenizer
81
+
82
+ tokenizer = LlamaTokenizer.from_pretrained("wordcab/llama-natural-instructions-7b")
83
+ tokenizer.padding_side = "left"
84
+ tokenizer.pad_token_id = (0)
85
+ ```
86
+
87
+ #### Load the model with the adapters
88
+
89
+ ```python
90
+ from peft import PeftModel
91
+ from transformers import LlamaForCausalLM
92
+
93
+ model = LlamaForCausalLM.from_pretrained(
94
+ "decapoda-research/llama-7b-hf",
95
+ load_in_8bit=True,
96
+ torch_dtype=torch.float16,
97
+ device_map="auto",
98
+ )
99
+ model = PeftModel.from_pretrained(
100
+ model,
101
+ "wordcab/llama-natural-instructions-7b",
102
+ torch_dtype=torch.float16,
103
+ device_map={"": 0},
104
+ )
105
+ ```
106
+
107
+ #### Load the full model
108
+
109
+ ⚠️ Work in progress...
110
+
111
+ ```python
112
+ model = LlamaForCausalLM.from_pretrained(
113
+ "wordcab/llama-natural-instructions-7b",
114
+ load_in_8bit=True,
115
+ torch_dtype=torch.float16,
116
+ device_map="auto",
117
+ )
118
+ ```
119
+
120
+ #### Evaluation mode
121
+
122
+ Don't forget to put the model in evaluation mode. And if you are using PyTorch v2.0 or higher don't forget to call
123
+ the compile method.
124
+
125
+ ```python
126
+ model.eval()
127
+ if torch.__version__ >= "2":
128
+ model = torch.compile(model)
129
+ ```
130
+
131
+ #### Generate the response
132
+
133
+ ```python
134
+ prompt = generate_prompt(
135
+ "In this task, you have to analyze the full sentences and do reasoning and quick maths to find the correct answer.",
136
+ f"You are now a superbowl star. You are the quarterback of the team. Your team is down by 3 points. You are in the last 2 minutes of the game. The other team has a score of 28. What is the score of your team?",
137
+ )
138
+ inputs = tokenizer(prompt, return_tensors="pt", padding=True, truncation=True, max_length=2048)
139
+ input_ids = inputs["input_ids"].to(model.device)
140
+
141
+ with torch.no_grad():
142
+ gen_outputs = model.generate(
143
+ input_ids=input_ids,
144
+ generation_config=generation_config,
145
+ return_dict_in_generate=True,
146
+ output_scores=True,
147
+ max_new_tokens=50,
148
+ )
149
+
150
+ s = gen_outputs.sequences[0]
151
+ output = tokenizer.decode(s, skip_special_tokens=True)
152
+ response = prompter.get_response(output)
153
+ print(response)
154
+ >>> 25
155
+ ```
156
+
157
+ You can try with other prompts that are not maths related as well! :hugs:
158
+
159
+ ## Beanchmark
160
+
161
+ We benchmarked our model on the following tasks: [BoolQ](https://huggingface.co/datasets/boolq), [PIQA](https://huggingface.co/datasets/piqa), [WinoGrande](https://huggingface.co/datasets/winogrande), [OpenBookQA](https://huggingface.co/datasets/openbookqa).
162
+
163
+ | | BoolQ | PIQA | WinoGrande | OpenBookQA | Precision | Inference time (s) |
164
+ | --- | --- | --- | --- | --- | --- | --- | --- | --- |
165
+ | Original LLaMA 7B | 76.5 | 79.8 | 70.1 | 57.2 | fp32 | 3 seconds |
166
+ | Original LLaMA 13B | 78.1 | 80.1 | 73 | 56.4 | fp32 | >5 seconds |
167
+ | LoRA LLaMA 7B | 63.9 | 51.3 | 48.9 | 31.4 | 8bit | 0.65 seconds |
168
+ | LoRA LLaMA 13B | 70 | 63.93 | 51.6 | 50.4 | 8bit | 1.2 seconds |
169
+
170
+ Overall our LoRA model is less performant than the original model from Meta, if we compare the results from the [original paper](https://arxiv.org/pdf/2302.13971.pdf).
171
+
172
+ The performance degradation is due to the fact we load the model in 8bit and we use the adapters from the LoRA training.
173
+ Thanks to the 8bit quantization, the model is 4 times faster than the original model and the results are still decent.
174
+
175
+ Some complex tasks like WinoGrande and OpenBookQA are more difficult to solve with the adapters.
adapter_config.json ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "base_model_name_or_path": "decapoda-research/llama-7b-hf",
3
+ "bias": "none",
4
+ "enable_lora": null,
5
+ "fan_in_fan_out": false,
6
+ "inference_mode": true,
7
+ "init_lora_weights": true,
8
+ "lora_alpha": 16,
9
+ "lora_dropout": 0.05,
10
+ "merge_weights": false,
11
+ "modules_to_save": null,
12
+ "peft_type": "LORA",
13
+ "r": 8,
14
+ "target_modules": [
15
+ "q_proj",
16
+ "v_proj"
17
+ ],
18
+ "task_type": "CAUSAL_LM"
19
+ }
adapter_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:247a006256134d733bcaa5b17848a3e7d277b53088f37c2ce4f095d2fbfbdc56
3
+ size 16822989
llama-natural-instructions-removebg-preview.png ADDED