SwastikM's picture
Update README.md
e6a47a1 verified
|
raw
history blame
5.69 kB
---
library_name: transformers
datasets:
- Salesforce/xlam-function-calling-60k
language:
- en
pipeline_tag: text-generation
tags:
- Function-Calling Agent
- LoRA
- BitsAndByes
- Llama-3-8B-Instruct
- APIGen Function-Calling
---
# Meta-Llama-3-8B-Instruct_bitsandbytes_4bit fine-tuned on Salesforce/xlam-function-calling-60k
Function-Calling Agent
# LoRA Adpater Head
Parameter Efficient Finetuning (PEFT) a 4bit quantized Meta-Llama-3-8B-Instruct on Salesforce/xlam-function-calling-60k dataset.
- **Language(s) (NLP):** English
- **License:** openrail
- **Qunatization:** BitsAndBytes
- **PEFT:** LoRA
- **Finetuned from model [SwastikM/Meta-Llama-3-8B-Instruct_bitsandbytes_4bit](https://huggingface.co/SwastikM/Meta-Llama-3-8B-Instruct_bitsandbytes_4bit)**
- **Dataset:** [Salesforce/xlam-function-calling-60k dataset](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k)
## Intended uses & limitations
Addressing the efficay of Quantization and PEFT. Implemented as a personal Project.
# How to use
## Install Required Libraries
```python
!pip install transformers accelerate bitsandbytes>0.37.0
!pip install peft
```
## Setup Adapter with Base Model
```Python
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer,AutoModelForCausalLM
from peft import PeftModel, PeftConfig, get_peft_model
import torch
base_model = AutoModelForCausalLM.from_pretrained("SwastikM/Meta-Llama-3-8B-Instruct_bitsandbytes_4bit",device_map="auto")
model = PeftModel.from_pretrained(base_model, "SwastikM/Meta-Llama3-8B-Chat-Adapter")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")
model = model.to("cuda")
model.eval()
```
## Setup Template and Infer
```Python
x1 = {"role": "system", "content": """You are a APIGen Function Calling Tool. You will br provided with a user query and associated tools for answering the query.
query (string): The query or problem statement.
tools (array): An array of available tools that can be used to solve the query.
Each tool is represented as an object with the following properties:
name (string): The name of the tool.
description (string): A brief description of what the tool does.
parameters (object): An object representing the parameters required by the tool.
Each parameter is represented as a key-value pair, where the key is the parameter name and the value is an object with the following properties:
type (string): The data type of the parameter (e.g., "int", "float", "list").
description (string): A brief description of the parameter.
required (boolean): Indicates whether the parameter is required or optional.
You will provide the Answer array.
Answers array provides the specific tool and arguments used to generate each answer."""}
x2 = {"role": "user", "content": None}
x3 = {"role": "assistant", "content": None}
user_template = 'Query: {Q} Tools: {T}'
response_template = '{A}'
Q = "Where can I find live giveaways for beta access and games?"
T = """[{"name": "live_giveaways_by_type", "description": "Retrieve live giveaways from the GamerPower API based on the specified type.", "parameters": {"type": {"description": "The type of giveaways to retrieve (e.g., game, loot, beta).", "type": "str", "default": "game"}}}]"""
x2['content'] = f'{user_template.format(Q=Q,T=T)}'
prompts = [x1,x2]
input_ids = tokenizer.apply_chat_template(
prompts,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
outputs = model.generate(
input_ids,
max_new_tokens=256,
eos_token_id=terminators
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))
```
## Size Comparison
The table shows comparison VRAM requirements for loading and training
of FP16 Base Model and 4bit bnb quantized model with PEFT.
The value for base model referenced from [Model Memory Calculator](https://huggingface.co/docs/accelerate/main/en/usage_guides/model_size_estimator)
from HuggingFace
| Model | Total Size | Training Using Adam |
| ------------------------|-------------| --------------------|
| Base Model | 28.21 GB | 56.42 GB |
| 4bitQuantized+PEFT | 5.21 GB | 13 GB |
## Training Details
### Training Data
****Dataset:**** [Salesforce/xlam-function-calling-60k dataset](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k)
Trained on `instruction` column of 20,00 randomly shuffled data.
### Training Procedure
HuggingFace Accelerate with Training Loop.
#### Training Hyperparameters
- **Optimizer:** AdamW
- **lr:** 2e-5
- **decay:** linear
- **batch_size:** 1
- **gradient_accumulation_steps:** 2
- **fp16:** True
LoraConfig
- ***r:*** 8
- ***lora_alpha:*** 32
- ***task_type:*** TaskType.CAUSAL_LM
- ***lora_dropout:*** 0.1
#### Hardware
- **GPU:** P100
## Acknowledgment
- Thanks to [@AMerve Noyan](https://huggingface.co/blog/merve/quantization) for precise intro.
- Thanks to [@HuggungFace Team](https://huggingface.co/blog/gptq-integration#fine-tune-quantized-models-with-peft) for the [notebook](https://colab.research.google.com/drive/1_TIrmuKOFhuRRiTWN94iLKUFu6ZX4ceb?usp=sharing) on GPTQ.
- Thanks to [@HuggungFace Team](https://huggingface.co/blog/4bit-transformers-bitsandbytes) for the Blog.
- Thanks to [@Salesforce](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) for the marvelous dataset.
## Model Card Authors
Swastik Maiti