--- license: apache-2.0 language: - en tags: - gemma - function calling - on-device language model - android - conversational inference: false --- # Octopus V1: On-device language model for function calling of software APIs
- Nexa AI Product - ArXiv
## Introducing Octopus-V1 Octopus-V1, a series of advanced open-source language models with parameters ranging from 2B to 7B, represents Nexa AI's breakthrough in AI-driven software API interactions. Developed through meticulous fine-tuning using a specialized dataset from 30k+ RapidHub APIs, Octopus-V1 excels in understanding API structures and syntax. The models leverage conditional masking techniques to ensure precise, format-compliant API calls without compromising inference speed. A novel benchmark introduced alongside Octopus-V1 assesses its superior performance against GPT-4 in software API usage, signifying a leap forward in automating software development and API integration. 📱 **Support 30k+ APIs from RapidAPI Hub**: Octopus leverages an extensive dataset derived from over 30,000 popular APIs on RapidAPI Hub. This rich dataset ensures broad coverage and understanding of diverse software API interactions, enhancing the model's utility across various applications. 🐙 **Accuracy**: Fine-tuning on models with 2B, 3B, and 7B parameters yields Octopus, which surpasses GPT-4 in API call accuracy. The introduction of a conditional mask further refines its precision, making Octopus highly reliable for software API interactions. 🎯 **Conditional Masking**: A novel conditional masking technique is employed to ensure outputs adhere to the desired formats and reduce errors. This approach not only maintains fast inference speeds but also substantially increases the model's accuracy in generating function calls and parameters. ## Example Use Cases You can run the model on a GPU using the following code. ```python from gemma.modeling_gemma import GemmaForCausalLM from transformers import AutoTokenizer import torch import time def inference(input_text): start_time = time.time() input_ids = tokenizer(input_text, return_tensors="pt").to(model.device) input_length = input_ids["input_ids"].shape[1] outputs = model.generate( input_ids=input_ids["input_ids"], max_length=1024, do_sample=False) generated_sequence = outputs[:, input_length:].tolist() res = tokenizer.decode(generated_sequence[0]) end_time = time.time() return {"output": res, "latency": end_time - start_time} model_id = "NexaAIDev/android_API_10k_data" tokenizer = AutoTokenizer.from_pretrained(model_id) model = GemmaForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto" ) input_text = "Take a selfie for me with front camera" nexa_query = f"Below is the query from the users, please call the correct function and generate the parameters to call the function.\n\nQuery: {input_text} \n\nResponse:" start_time = time.time() print("nexa model result:\n", inference(nexa_query)) print("latency:", time.time() - start_time," s") ``` ## Evaluation ## License This model was trained on commercially viable data and is under the [Nexa AI community disclaimer](https://www.nexa4ai.com/disclaimer). ## References We thank the Google Gemma team for their amazing models! ``` @misc{gemma-2023-open-models, author = {{Gemma Team, Google DeepMind}}, title = {Gemma: Open Models Based on Gemini Research and Technology}, url = {https://goo.gle/GemmaReport}, year = {2023}, } ``` ## Citation ``` @misc{TODO} ``` ## Contact Please [contact us](dev@nexa4ai.com) to reach out for any issues and comments!