|
--- |
|
library_name: transformers |
|
datasets: |
|
- Salesforce/xlam-function-calling-60k |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
tags: |
|
- Function-Calling Agent |
|
- LoRA |
|
- BitsAndByes |
|
- Llama-3-8B-Instruct |
|
- APIGen Function-Calling |
|
--- |
|
|
|
# Meta-Llama-3-8B-Instruct_bitsandbytes_4bit fine-tuned on Salesforce/xlam-function-calling-60k |
|
|
|
Function-Calling Agent |
|
|
|
# LoRA Adpater Head |
|
|
|
Parameter Efficient Finetuning (PEFT) a 4bit quantized Meta-Llama-3-8B-Instruct on Salesforce/xlam-function-calling-60k dataset. |
|
|
|
- **Language(s) (NLP):** English |
|
- **License:** openrail |
|
- **Qunatization:** BitsAndBytes |
|
- **PEFT:** LoRA |
|
- **Finetuned from model [SwastikM/Meta-Llama-3-8B-Instruct_bitsandbytes_4bit](https://huggingface.co/SwastikM/Meta-Llama-3-8B-Instruct_bitsandbytes_4bit)** |
|
- **Dataset:** [Salesforce/xlam-function-calling-60k dataset](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) |
|
|
|
## Intended uses & limitations |
|
|
|
Addressing the efficay of Quantization and PEFT. Implemented as a personal Project. |
|
|
|
|
|
# How to use |
|
|
|
## Install Required Libraries |
|
```python |
|
!pip install transformers accelerate bitsandbytes>0.37.0 |
|
!pip install peft |
|
``` |
|
## Setup Adapter with Base Model |
|
```Python |
|
from peft import AutoPeftModelForCausalLM |
|
from transformers import AutoTokenizer,AutoModelForCausalLM |
|
from peft import PeftModel, PeftConfig, get_peft_model |
|
import torch |
|
|
|
base_model = AutoModelForCausalLM.from_pretrained("SwastikM/Meta-Llama-3-8B-Instruct_bitsandbytes_4bit",device_map="auto") |
|
model = PeftModel.from_pretrained(base_model, "SwastikM/Meta-Llama3-8B-Chat-Adapter") |
|
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct") |
|
|
|
model = model.to("cuda") |
|
model.eval() |
|
``` |
|
|
|
## Setup Template and Infer |
|
```Python |
|
x1 = {"role": "system", "content": """You are a APIGen Function Calling Tool. You will br provided with a user query and associated tools for answering the query. |
|
query (string): The query or problem statement. |
|
tools (array): An array of available tools that can be used to solve the query. |
|
Each tool is represented as an object with the following properties: |
|
name (string): The name of the tool. |
|
description (string): A brief description of what the tool does. |
|
parameters (object): An object representing the parameters required by the tool. |
|
Each parameter is represented as a key-value pair, where the key is the parameter name and the value is an object with the following properties: |
|
type (string): The data type of the parameter (e.g., "int", "float", "list"). |
|
description (string): A brief description of the parameter. |
|
required (boolean): Indicates whether the parameter is required or optional. |
|
You will provide the Answer array. |
|
Answers array provides the specific tool and arguments used to generate each answer."""} |
|
x2 = {"role": "user", "content": None} |
|
x3 = {"role": "assistant", "content": None} |
|
user_template = 'Query: {Q} Tools: {T}' |
|
response_template = '{A}' |
|
Q = "Where can I find live giveaways for beta access and games?" |
|
T = """[{"name": "live_giveaways_by_type", "description": "Retrieve live giveaways from the GamerPower API based on the specified type.", "parameters": {"type": {"description": "The type of giveaways to retrieve (e.g., game, loot, beta).", "type": "str", "default": "game"}}}]""" |
|
|
|
|
|
x2['content'] = f'{user_template.format(Q=Q,T=T)}' |
|
prompts = [x1,x2] |
|
input_ids = tokenizer.apply_chat_template( |
|
prompts, |
|
add_generation_prompt=True, |
|
return_tensors="pt" |
|
).to(model.device) |
|
|
|
terminators = [ |
|
tokenizer.eos_token_id, |
|
tokenizer.convert_tokens_to_ids("<|eot_id|>") |
|
] |
|
|
|
outputs = model.generate( |
|
input_ids, |
|
max_new_tokens=256, |
|
eos_token_id=terminators |
|
) |
|
|
|
response = outputs[0][input_ids.shape[-1]:] |
|
print(tokenizer.decode(response, skip_special_tokens=True)) |
|
``` |
|
|
|
|
|
## Size Comparison |
|
|
|
The table shows comparison VRAM requirements for loading and training |
|
of FP16 Base Model and 4bit bnb quantized model with PEFT. |
|
The value for base model referenced from [Model Memory Calculator](https://huggingface.co/docs/accelerate/main/en/usage_guides/model_size_estimator) |
|
from HuggingFace |
|
|
|
|
|
|
|
|
|
| Model | Total Size | Training Using Adam | |
|
| ------------------------|-------------| --------------------| |
|
| Base Model | 28.21 GB | 56.42 GB | |
|
| 4bitQuantized+PEFT | 5.21 GB | 13 GB | |
|
|
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
****Dataset:**** [Salesforce/xlam-function-calling-60k dataset](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) |
|
|
|
Trained on `instruction` column of 20,00 randomly shuffled data. |
|
|
|
### Training Procedure |
|
|
|
HuggingFace Accelerate with Training Loop. |
|
|
|
|
|
#### Training Hyperparameters |
|
|
|
- **Optimizer:** AdamW |
|
- **lr:** 2e-5 |
|
- **decay:** linear |
|
- **batch_size:** 1 |
|
- **gradient_accumulation_steps:** 2 |
|
- **fp16:** True |
|
|
|
LoraConfig |
|
- ***r:*** 8 |
|
- ***lora_alpha:*** 32 |
|
- ***task_type:*** TaskType.CAUSAL_LM |
|
- ***lora_dropout:*** 0.1 |
|
|
|
#### Hardware |
|
|
|
- **GPU:** P100 |
|
|
|
## Acknowledgment |
|
|
|
- Thanks to [@AMerve Noyan](https://huggingface.co/blog/merve/quantization) for precise intro. |
|
- Thanks to [@HuggungFace Team](https://huggingface.co/blog/gptq-integration#fine-tune-quantized-models-with-peft) for the [notebook](https://colab.research.google.com/drive/1_TIrmuKOFhuRRiTWN94iLKUFu6ZX4ceb?usp=sharing) on GPTQ. |
|
- Thanks to [@HuggungFace Team](https://huggingface.co/blog/4bit-transformers-bitsandbytes) for the Blog. |
|
- Thanks to [@Salesforce](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) for the marvelous dataset. |
|
|
|
## Model Card Authors |
|
Swastik Maiti |
|
|
|
|