--- library_name: transformers datasets: - Salesforce/xlam-function-calling-60k language: - en pipeline_tag: text-generation tags: - Function-Calling Agent - LoRA - BitsAndByes - Llama-3-8B-Instruct - APIGen Function-Calling --- # Meta-Llama-3-8B-Instruct_bitsandbytes_4bit fine-tuned on Salesforce/xlam-function-calling-60k Function-Calling Agent # LoRA Adpater Head Parameter Efficient Finetuning (PEFT) a 4bit quantized Meta-Llama-3-8B-Instruct on Salesforce/xlam-function-calling-60k dataset. - **Language(s) (NLP):** English - **License:** openrail - **Qunatization:** BitsAndBytes - **PEFT:** LoRA - **Finetuned from model [SwastikM/Meta-Llama-3-8B-Instruct_bitsandbytes_4bit](https://huggingface.co/SwastikM/Meta-Llama-3-8B-Instruct_bitsandbytes_4bit)** - **Dataset:** [Salesforce/xlam-function-calling-60k dataset](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) ## Intended uses & limitations Addressing the efficay of Quantization and PEFT. Implemented as a personal Project. # How to use ## Install Required Libraries ```python !pip install transformers accelerate bitsandbytes>0.37.0 !pip install peft ``` ## Setup Adapter with Base Model ```Python from peft import AutoPeftModelForCausalLM from transformers import AutoTokenizer,AutoModelForCausalLM from peft import PeftModel, PeftConfig, get_peft_model import torch base_model = AutoModelForCausalLM.from_pretrained("SwastikM/Meta-Llama-3-8B-Instruct_bitsandbytes_4bit",device_map="auto") model = PeftModel.from_pretrained(base_model, "SwastikM/Meta-Llama3-8B-Chat-Adapter") tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct") model = model.to("cuda") model.eval() ``` ## Setup Template and Infer ```Python x1 = {"role": "system", "content": """You are a APIGen Function Calling Tool. You will br provided with a user query and associated tools for answering the query. query (string): The query or problem statement. tools (array): An array of available tools that can be used to solve the query. Each tool is represented as an object with the following properties: name (string): The name of the tool. description (string): A brief description of what the tool does. parameters (object): An object representing the parameters required by the tool. Each parameter is represented as a key-value pair, where the key is the parameter name and the value is an object with the following properties: type (string): The data type of the parameter (e.g., "int", "float", "list"). description (string): A brief description of the parameter. required (boolean): Indicates whether the parameter is required or optional. You will provide the Answer array. Answers array provides the specific tool and arguments used to generate each answer."""} x2 = {"role": "user", "content": None} x3 = {"role": "assistant", "content": None} user_template = 'Query: {Q} Tools: {T}' response_template = '{A}' Q = "Where can I find live giveaways for beta access and games?" T = """[{"name": "live_giveaways_by_type", "description": "Retrieve live giveaways from the GamerPower API based on the specified type.", "parameters": {"type": {"description": "The type of giveaways to retrieve (e.g., game, loot, beta).", "type": "str", "default": "game"}}}]""" x2['content'] = f'{user_template.format(Q=Q,T=T)}' prompts = [x1,x2] input_ids = tokenizer.apply_chat_template( prompts, add_generation_prompt=True, return_tensors="pt" ).to(model.device) terminators = [ tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|>") ] outputs = model.generate( input_ids, max_new_tokens=256, eos_token_id=terminators ) response = outputs[0][input_ids.shape[-1]:] print(tokenizer.decode(response, skip_special_tokens=True)) ``` ## Size Comparison The table shows comparison VRAM requirements for loading and training of FP16 Base Model and 4bit bnb quantized model with PEFT. The value for base model referenced from [Model Memory Calculator](https://huggingface.co/docs/accelerate/main/en/usage_guides/model_size_estimator) from HuggingFace | Model | Total Size | Training Using Adam | | ------------------------|-------------| --------------------| | Base Model | 28.21 GB | 56.42 GB | | 4bitQuantized+PEFT | 5.21 GB | 13 GB | ## Training Details ### Training Data ****Dataset:**** [Salesforce/xlam-function-calling-60k dataset](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) Trained on `instruction` column of 20,00 randomly shuffled data. ### Training Procedure HuggingFace Accelerate with Training Loop. #### Training Hyperparameters - **Optimizer:** AdamW - **lr:** 2e-5 - **decay:** linear - **batch_size:** 1 - **gradient_accumulation_steps:** 2 - **fp16:** True LoraConfig - ***r:*** 8 - ***lora_alpha:*** 32 - ***task_type:*** TaskType.CAUSAL_LM - ***lora_dropout:*** 0.1 #### Hardware - **GPU:** P100 ## Acknowledgment - Thanks to [@AMerve Noyan](https://huggingface.co/blog/merve/quantization) for precise intro. - Thanks to [@HuggungFace Team](https://huggingface.co/blog/gptq-integration#fine-tune-quantized-models-with-peft) for the [notebook](https://colab.research.google.com/drive/1_TIrmuKOFhuRRiTWN94iLKUFu6ZX4ceb?usp=sharing) on GPTQ. - Thanks to [@HuggungFace Team](https://huggingface.co/blog/4bit-transformers-bitsandbytes) for the Blog. - Thanks to [@Salesforce](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) for the marvelous dataset. ## Model Card Authors Swastik Maiti