---
library_name: transformers
datasets:
- Salesforce/xlam-function-calling-60k
language:
- en
pipeline_tag: text-generation
tags:
- Function-Calling Agent
- LoRA
- BitsAndByes
- Llama-3-8B-Instruct
- APIGen Function-Calling 
---

# Meta-Llama-3-8B-Instruct_bitsandbytes_4bit fine-tuned on Salesforce/xlam-function-calling-60k

Function-Calling Agent

# LoRA Adpater Head

Parameter Efficient Finetuning (PEFT) a 4bit quantized Meta-Llama-3-8B-Instruct on Salesforce/xlam-function-calling-60k dataset.

- **Language(s) (NLP):** English
- **License:** openrail
- **Qunatization:** BitsAndBytes
- **PEFT:** LoRA
- **Finetuned from model [SwastikM/Meta-Llama-3-8B-Instruct_bitsandbytes_4bit](https://huggingface.co/SwastikM/Meta-Llama-3-8B-Instruct_bitsandbytes_4bit)**
- **Dataset:** [Salesforce/xlam-function-calling-60k dataset](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k)

## Intended uses & limitations

Addressing the efficay of Quantization and PEFT. Implemented as a personal Project.


# How to use

## Install Required Libraries
```python
!pip install transformers accelerate bitsandbytes>0.37.0
!pip install peft
```
## Setup Adapter with Base Model
```Python
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer,AutoModelForCausalLM
from peft import PeftModel, PeftConfig, get_peft_model
import torch

base_model = AutoModelForCausalLM.from_pretrained("SwastikM/Meta-Llama-3-8B-Instruct_bitsandbytes_4bit",device_map="auto")
model = PeftModel.from_pretrained(base_model, "SwastikM/Meta-Llama3-8B-Chat-Adapter")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")

model = model.to("cuda")
model.eval()
```

## Setup Template and Infer
```Python
x1 = {"role": "system", "content": """You are a APIGen Function Calling Tool. You will br provided with a user query and associated tools for answering the query.
    query (string): The query or problem statement.
    tools (array): An array of available tools that can be used to solve the query.
    Each tool is represented as an object with the following properties:
        name (string): The name of the tool.
        description (string): A brief description of what the tool does.
        parameters (object): An object representing the parameters required by the tool.
            Each parameter is represented as a key-value pair, where the key is the parameter name and the value is an object with the following properties:
                type (string): The data type of the parameter (e.g., "int", "float", "list").
                description (string): A brief description of the parameter.
                required (boolean): Indicates whether the parameter is required or optional.
    You will provide the Answer array.
        Answers array provides the specific tool and arguments used to generate each answer."""}
x2 = {"role": "user", "content": None}
x3 = {"role": "assistant", "content": None}
user_template = 'Query: {Q} Tools: {T}'
response_template = '{A}'
Q = "Where can I find live giveaways for beta access and games?"
T = """[{"name": "live_giveaways_by_type", "description": "Retrieve live giveaways from the GamerPower API based on the specified type.", "parameters": {"type": {"description": "The type of giveaways to retrieve (e.g., game, loot, beta).", "type": "str", "default": "game"}}}]"""


x2['content'] = f'{user_template.format(Q=Q,T=T)}'
prompts = [x1,x2]
input_ids = tokenizer.apply_chat_template(
    prompts,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = model.generate(
    input_ids,
    max_new_tokens=256,
    eos_token_id=terminators
)

response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))
```


## Size Comparison

The table shows comparison VRAM requirements for loading and training
of FP16 Base Model and 4bit bnb quantized model with PEFT.
The value for base model referenced from [Model Memory Calculator](https://huggingface.co/docs/accelerate/main/en/usage_guides/model_size_estimator)
from HuggingFace


| Model                   | Total Size  | Training Using Adam |
| ------------------------|-------------| --------------------| 
| Base Model              | 28.21 GB    | 56.42 GB            |
| 4bitQuantized+PEFT      | 5.21 GB     | 13 GB               |


## Training Details

### Training Data

****Dataset:**** [Salesforce/xlam-function-calling-60k dataset](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k)

Trained on `instruction` column of 20,00 randomly shuffled data.

### Training Procedure

HuggingFace Accelerate with Training Loop.


#### Training Hyperparameters

- **Optimizer:** AdamW
- **lr:** 2e-5
- **decay:** linear
- **batch_size:** 1
- **gradient_accumulation_steps:** 2
- **fp16:** True

 LoraConfig
 - ***r:*** 8
 - ***lora_alpha:*** 32
 - ***task_type:*** TaskType.CAUSAL_LM
 - ***lora_dropout:*** 0.1

#### Hardware

- **GPU:** P100

## Acknowledgment

- Thanks to [@AMerve Noyan](https://huggingface.co/blog/merve/quantization) for precise intro.
- Thanks to [@HuggungFace Team](https://huggingface.co/blog/4bit-transformers-bitsandbytes) for the Blog.
- Thanks to [@Salesforce](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) for the marvelous dataset.

## Model Card Authors
Swastik Maiti