Update README.md

e6a47a1 verified 3 months ago

5.69 kB

	---
	library_name: transformers
	datasets:
	- Salesforce/xlam-function-calling-60k
	language:
	- en
	pipeline_tag: text-generation
	tags:
	- Function-Calling Agent
	- LoRA
	- BitsAndByes
	- Llama-3-8B-Instruct
	- APIGen Function-Calling
	---

	# Meta-Llama-3-8B-Instruct_bitsandbytes_4bit fine-tuned on Salesforce/xlam-function-calling-60k

	Function-Calling Agent

	# LoRA Adpater Head

	Parameter Efficient Finetuning (PEFT) a 4bit quantized Meta-Llama-3-8B-Instruct on Salesforce/xlam-function-calling-60k dataset.

	- Language(s) (NLP): English
	- License: openrail
	- Qunatization: BitsAndBytes
	- PEFT: LoRA
	- Finetuned from model [SwastikM/Meta-Llama-3-8B-Instruct_bitsandbytes_4bit](https://huggingface.co/SwastikM/Meta-Llama-3-8B-Instruct_bitsandbytes_4bit)
	- Dataset: [Salesforce/xlam-function-calling-60k dataset](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k)

	## Intended uses & limitations

	Addressing the efficay of Quantization and PEFT. Implemented as a personal Project.


	# How to use

	## Install Required Libraries
	```python
	!pip install transformers accelerate bitsandbytes>0.37.0
	!pip install peft
	```
	## Setup Adapter with Base Model
	```Python
	from peft import AutoPeftModelForCausalLM
	from transformers import AutoTokenizer,AutoModelForCausalLM
	from peft import PeftModel, PeftConfig, get_peft_model
	import torch

	base_model = AutoModelForCausalLM.from_pretrained("SwastikM/Meta-Llama-3-8B-Instruct_bitsandbytes_4bit",device_map="auto")
	model = PeftModel.from_pretrained(base_model, "SwastikM/Meta-Llama3-8B-Chat-Adapter")
	tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")

	model = model.to("cuda")
	model.eval()
	```

	## Setup Template and Infer
	```Python
	x1 = {"role": "system", "content": """You are a APIGen Function Calling Tool. You will br provided with a user query and associated tools for answering the query.
	query (string): The query or problem statement.
	tools (array): An array of available tools that can be used to solve the query.
	Each tool is represented as an object with the following properties:
	name (string): The name of the tool.
	description (string): A brief description of what the tool does.
	parameters (object): An object representing the parameters required by the tool.
	Each parameter is represented as a key-value pair, where the key is the parameter name and the value is an object with the following properties:
	type (string): The data type of the parameter (e.g., "int", "float", "list").
	description (string): A brief description of the parameter.
	required (boolean): Indicates whether the parameter is required or optional.
	You will provide the Answer array.
	Answers array provides the specific tool and arguments used to generate each answer."""}
	x2 = {"role": "user", "content": None}
	x3 = {"role": "assistant", "content": None}
	user_template = 'Query: {Q} Tools: {T}'
	response_template = '{A}'
	Q = "Where can I find live giveaways for beta access and games?"
	T = """[{"name": "live_giveaways_by_type", "description": "Retrieve live giveaways from the GamerPower API based on the specified type.", "parameters": {"type": {"description": "The type of giveaways to retrieve (e.g., game, loot, beta).", "type": "str", "default": "game"}}}]"""


	x2['content'] = f'{user_template.format(Q=Q,T=T)}'
	prompts = [x1,x2]
	input_ids = tokenizer.apply_chat_template(
	prompts,
	add_generation_prompt=True,
	return_tensors="pt"
	).to(model.device)

	terminators = [
	tokenizer.eos_token_id,
	tokenizer.convert_tokens_to_ids("<\|eot_id\|>")
	]

	outputs = model.generate(
	input_ids,
	max_new_tokens=256,
	eos_token_id=terminators
	)

	response = outputs[0][input_ids.shape[-1]:]
	print(tokenizer.decode(response, skip_special_tokens=True))
	```


	## Size Comparison

	The table shows comparison VRAM requirements for loading and training
	of FP16 Base Model and 4bit bnb quantized model with PEFT.
	The value for base model referenced from [Model Memory Calculator](https://huggingface.co/docs/accelerate/main/en/usage_guides/model_size_estimator)
	from HuggingFace




	\| Model \| Total Size \| Training Using Adam \|
	\| ------------------------\|-------------\| --------------------\|
	\| Base Model \| 28.21 GB \| 56.42 GB \|
	\| 4bitQuantized+PEFT \| 5.21 GB \| 13 GB \|


	## Training Details

	### Training Data

	**Dataset:** [Salesforce/xlam-function-calling-60k dataset](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k)

	Trained on `instruction` column of 20,00 randomly shuffled data.

	### Training Procedure

	HuggingFace Accelerate with Training Loop.


	#### Training Hyperparameters

	- Optimizer: AdamW
	- lr: 2e-5
	- decay: linear
	- batch_size: 1
	- gradient_accumulation_steps: 2
	- fp16: True

	LoraConfig
	- *r:* 8
	- *lora_alpha:* 32
	- *task_type:* TaskType.CAUSAL_LM
	- *lora_dropout:* 0.1

	#### Hardware

	- GPU: P100

	## Acknowledgment

	- Thanks to [@AMerve Noyan](https://huggingface.co/blog/merve/quantization) for precise intro.
	- Thanks to [@HuggungFace Team](https://huggingface.co/blog/gptq-integration#fine-tune-quantized-models-with-peft) for the [notebook](https://colab.research.google.com/drive/1_TIrmuKOFhuRRiTWN94iLKUFu6ZX4ceb?usp=sharing) on GPTQ.
	- Thanks to [@HuggungFace Team](https://huggingface.co/blog/4bit-transformers-bitsandbytes) for the Blog.
	- Thanks to [@Salesforce](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) for the marvelous dataset.

	## Model Card Authors
	Swastik Maiti