SwastikM
/

Llama-2-7B-Chat-text2code

Text Generation

Llama-2-7B-Chat

instruction2code

Model card Files Files and versions Community

Llama-2-7B-Chat-text2code / README.md

SwastikM's picture

Update README.md

11d61c0 verified 9 months ago

|

3.14 kB

	---
	library_name: peft
	base_model: TheBloke/Llama-2-7b-Chat-GPTQ
	pipeline_tag: text-generation
	inference: false
	license: openrail
	language:
	- en
	datasets:
	- flytech/python-codes-25k
	tags:
	- text2code
	- LoRA
	- GPTQ
	- Llama-2-7B-Chat
	- text2python
	- instruction2code
	---

	# Llama-2-7b-Chat-GPTQ fine-tuned on PYTHON-CODES-25K

	Generate Python code that accomplishes the task instructed.


	## LoRA Adpater Head

	### Description

	Parameter Efficient Finetuning(PEFT) a 4bit quantized Llama-2-7b-Chat from TheBloke/Llama-2-7b-Chat-GPTQ on flytech/python-codes-25k dataset.

	- Language(s) (NLP): English
	- License: openrail
	- Qunatization: GPTQ 4bit
	- PEFT: LoRA
	- Finetuned from model [TheBloke/Llama-2-7b-Chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GPTQ)
	- Dataset: [flytech/python-codes-25k](https://huggingface.co/datasets/flytech/python-codes-25k)

	## Intended uses & limitations

	Addressing the efficay of Quantization and PEFT. Implemented as a personal Project.

	### How to use

	```
	The quantized model is finetuned as PEFT. We have the trained Adapter.
	Merging LoRA adapated with GPTQ quantized model is not yet supported.
	So instead of loading a single finetuned model, we need to load the mase model and merge the finetuned adapter on top.
	```

	```python
	instruction = """model_input = "Help me set up my daily to-do list!""""
	```
	```python
	from peft import PeftModel, PeftConfig
	from transformers import AutoModelForCausalLM

	config = PeftConfig.from_pretrained("SwastikM/Llama-2-7B-Chat-text2code")
	model = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-7b-Chat-GPTQ")
	model = PeftModel.from_pretrained(model, "SwastikM/Llama-2-7B-Chat-text2code")
	tokenizer = AutoTokenizer.from_pretrained("SwastikM/Llama-2-7B-Chat-text2code")

	inputs = tokenizer(instruction, return_tensors="pt").input_ids.to('cuda')
	outputs = model.generate(inputs, max_new_tokens=500, do_sample=False, num_beams=1)
	code = tokenizer.decode(outputs[0], skip_special_tokens=True)

	print(code)
	```


	## Training Details

	### Training Data

	[gretelai/synthetic_text_to_sql](https://huggingface.co/datasets/gretelai/synthetic_text_to_sql)

	### Training Procedure

	HuggingFace Accelerate with Training Loop.


	#### Training Hyperparameters

	- Optimizer: AdamW
	- lr: 2e-5
	- decay: linear
	- batch_size: 4
	- gradient_accumulation_steps: 8
	- global_step: 625


	#### Hardware

	- GPU: P100


	## Additional Information

	- *Github:* [Repository]()
	- *Intro to quantization:* [Blog](https://huggingface.co/blog/merve/quantization)
	- *Emergent Feature:* [Academic](https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features)
	- *GPTQ Paper:* [GPTQ](https://arxiv.org/pdf/2210.17323)
	- *BITSANDBYTES and further* [LLM.int8()](https://arxiv.org/pdf/2208.07339)

	## Acknowledgment

	Thanks to [@AMerve Noyan](https://huggingface.co/blog/merve/quantization) for precise intro.
	Thanks to [@HuggungFace Team](https://colab.research.google.com/drive/1_TIrmuKOFhuRRiTWN94iLKUFu6ZX4ceb?usp=sharing#scrollTo=vT0XjNc2jYKy) for coding guide on gptq.


	## Model Card Authors

	Swastik Maiti