File size: 3,137 Bytes
38dcd4c
04c9c5b
38dcd4c
df26af2
4b37c7f
202436c
 
 
696204f
 
3880981
 
 
 
 
 
 
38dcd4c
 
041b028
38dcd4c
041b028
38dcd4c
 
041b028
38dcd4c
041b028
38dcd4c
041b028
38dcd4c
4d1fbd6
 
2923e41
 
 
 
38dcd4c
4d1fbd6
38dcd4c
2923e41
38dcd4c
4d1fbd6
38dcd4c
11d61c0
 
 
 
 
38dcd4c
4d1fbd6
2923e41
4d1fbd6
 
2923e41
 
38dcd4c
2923e41
 
 
 
38dcd4c
2923e41
 
 
38dcd4c
2923e41
4d1fbd6
38dcd4c
 
 
 
 
 
4d1fbd6
38dcd4c
 
 
4d1fbd6
38dcd4c
 
 
 
4d1fbd6
 
 
11d61c0
 
 
38dcd4c
 
 
 
4d1fbd6
38dcd4c
 
4d1fbd6
38dcd4c
11d61c0
 
 
 
 
38dcd4c
4d1fbd6
38dcd4c
11d61c0
 
38dcd4c
 
4d1fbd6
38dcd4c
4d1fbd6
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
---
library_name: peft
base_model: TheBloke/Llama-2-7b-Chat-GPTQ
pipeline_tag: text-generation
inference: false
license: openrail
language:
- en
datasets:
- flytech/python-codes-25k
tags:
- text2code
- LoRA
- GPTQ
- Llama-2-7B-Chat
- text2python
- instruction2code
---

# Llama-2-7b-Chat-GPTQ fine-tuned on PYTHON-CODES-25K

Generate Python code that accomplishes the task instructed.


## LoRA Adpater Head

### Description

Parameter Efficient Finetuning(PEFT) a 4bit quantized Llama-2-7b-Chat from TheBloke/Llama-2-7b-Chat-GPTQ on flytech/python-codes-25k dataset.

- **Language(s) (NLP):** English
- **License:** openrail
- **Qunatization:** GPTQ 4bit
- **PEFT:** LoRA
- **Finetuned from model [TheBloke/Llama-2-7b-Chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GPTQ)**
- **Dataset:** [flytech/python-codes-25k](https://huggingface.co/datasets/flytech/python-codes-25k)

## Intended uses & limitations

Addressing the efficay of Quantization and PEFT. Implemented as a personal Project.

### How to use

```
The quantized model is finetuned as PEFT. We have the trained Adapter.
Merging LoRA adapated with GPTQ quantized model is not yet supported.
So instead of loading a single finetuned model, we need to load the mase model and merge the finetuned adapter on top.
```

```python
instruction = """model_input = "Help me set up my daily to-do list!""""
```
```python
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM

config = PeftConfig.from_pretrained("SwastikM/Llama-2-7B-Chat-text2code")
model = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-7b-Chat-GPTQ")
model = PeftModel.from_pretrained(model, "SwastikM/Llama-2-7B-Chat-text2code")
tokenizer = AutoTokenizer.from_pretrained("SwastikM/Llama-2-7B-Chat-text2code")

inputs = tokenizer(instruction, return_tensors="pt").input_ids.to('cuda')
outputs = model.generate(inputs, max_new_tokens=500, do_sample=False, num_beams=1)
code = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(code)
```


## Training Details

### Training Data

[gretelai/synthetic_text_to_sql](https://huggingface.co/datasets/gretelai/synthetic_text_to_sql)

### Training Procedure

HuggingFace Accelerate with Training Loop.


#### Training Hyperparameters

- **Optimizer:** AdamW
- **lr:** 2e-5
- **decay:** linear
- **batch_size:** 4
- **gradient_accumulation_steps:** 8
- **global_step:** 625


#### Hardware

- **GPU:** P100


## Additional Information

- ***Github:*** [Repository]()
- ***Intro to quantization:*** [Blog](https://huggingface.co/blog/merve/quantization)
- ***Emergent Feature:*** [Academic](https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features)
- ***GPTQ Paper:*** [GPTQ](https://arxiv.org/pdf/2210.17323)
- ***BITSANDBYTES and further*** [LLM.int8()](https://arxiv.org/pdf/2208.07339)

## Acknowledgment

Thanks to [@AMerve Noyan](https://huggingface.co/blog/merve/quantization) for precise intro.
Thanks to [@HuggungFace Team](https://colab.research.google.com/drive/1_TIrmuKOFhuRRiTWN94iLKUFu6ZX4ceb?usp=sharing#scrollTo=vT0XjNc2jYKy) for coding guide on gptq.


## Model Card Authors

Swastik Maiti