File size: 10,639 Bytes
22392a3 c380195 f6d46ae 22392a3 b00b5e9 e566774 ef9a3e6 22392a3 bb07fcc 22392a3 bcc0e29 22392a3 bb07fcc 22392a3 f58034e 22392a3 a7b1a16 9c80bb9 5f912e6 678f51c 5f912e6 9c80bb9 5f912e6 9c80bb9 5f912e6 9c80bb9 5f912e6 9c80bb9 5f912e6 9c80bb9 22392a3 5f912e6 22392a3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 |
---
license: cc-by-nc-4.0
datasets:
- Salesforce/xlam-function-calling-60k
- MadeAgents/xlam-irrelevance-7.5k
base_model:
- Qwen/Qwen2.5-Coder-0.5B-Instruct
---
# Hammer2.1-0.5b Function Calling Model
## Introduction
Hammer refers to a series of lightweight Large Action Models. Currently, we are releasing Hammer 2.1 models ([0.5B](https://huggingface.co/MadeAgents/Hammer2.1-0.5b), [1.5B](https://huggingface.co/MadeAgents/Hammer2.1-1.5b), [3B](https://huggingface.co/MadeAgents/Hammer2.1-3b), and [7B](https://huggingface.co/MadeAgents/Hammer2.1-7b)) with strong function calling capability. These models are based on the Qwen 2.5 coder series and utilize [function masking techniques](https://arxiv.org/abs/2410.04587) and other advanced technologies. Hammer 2.1 series bring significant enhancements, while still maintaining the basic functionality of Hammer 2.0's Single-Turn interaction and further strengthening other capabilities.
## Model Details
The Hammer 2.1 models, fine-tuned from the Qwen 2.5 coder series, inherit Hammer 2.0's advantages and are enhanced as follows:
- <span style="color: red;">Multi-Step Function Calling:</span> The assistant can perform multiple internal function calls to handle a single user request, actively planning and gathering information to fulfill complex tasks.
- <span style="color: red;">Multi-Turn Function Calling:</span> Enables continuous and context-aware interactions over multiple exchanges, with each turn potentially containing multiple steps, for a more natural conversation experience.
- Enhanced Irrelevant Information Inspection: Better at identifying when provided functions are irrelevant to a user query, by providing a non-function call response.
## Evaluation
The evaluation results of Hammer 2.1 models on the Berkeley Function-Calling Leaderboard (BFCL-v3) are presented in the following table:
<div style="text-align: center;">
<img src="v2_figures/bfcl.png" alt="overview" width="1000" style="margin: auto;">
</div>
Our Hammer 2.1 series consistently achieves corresponding best performance at comparable scales. The 7B/3B/1.5B model outperform most function calling enchanced models.
In addition, we evaluated the Hammer 2.1 models on other academic benchmarks to further demonstrate the generalization ability of our models.
<div style="text-align: center;">
<img src="v2_figures/others-v2.png" alt="overview" width="1000" style="margin: auto;">
</div>
Hammer 2.1 models showcase highly stable performance, suggesting the robustness of Hammer 2.1 series. In contrast, the baseline approaches display varying levels of effectiveness.
## Requiements
The code of Hammer 2.1 models have been in the latest Hugging face transformers and we advise you to install `transformers>=4.47.0`.
## How to Use
Hammer models offer flexibility in deployment and usage, fully supporting both **vLLM** deployment and **Hugging Face Transformers** tool calling. Below are the specifics on how to make use of these features:
### Using vLLM
#### Option 1: Using Hammer client (Recommended)
Before using vLLM, first clone the Hammer code repository and change directory to the 'Hammer':
```
git clone https://github.com/MadeAgents/Hammer.git
cd Hammer
```
vLLM offers efficient serving with lower latency. To serve the model with vLLM:
```
vllm serve MadeAgents/Hammer2.1-0.5b --host 0.0.0.0 --port 8000 --tensor-parallel-size 1
```
Once the model is served, you can use the following Hammer client to interact with it for function calling:
~~~
from client import HammerChatCompletion,HammerConfig
config = HammerConfig(base_url="http://localhost:8000/v1/", model="MadeAgents/Hammer2.1-0.5b")
llm = HammerChatCompletion.from_config(config)
# Example conversation
messages = [
{"role": "user", "content": "What's the weather like in New York?"},
{"role": "assistant","content": '```\n{"name": "get_weather", "arguments": {"location": "New York, NY ", "unit": "celsius"}\n```'},
{"role": "tool", "name": "get_weather", "content": '{"temperature": 72, "description": "Partly cloudy"}'},
{"role": "user", "content": "Now, search for the weather in San Francisco."}
]
# Example function definition (optional)
tools = [
{
"name": "get_weather",
"description": "Get the current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "The unit of temperature to return"}
},
"required": ["location"]
}
},
{
"name": "respond",
"description": "When you are ready to respond, use this function. This function allows the assistant to formulate and deliver appropriate replies based on the input message and the context of the conversation. Generate a concise response for simple questions, and a more detailed response for complex questions.",
"parameters": {
"type": "object",
"properties": {
"message": {"type": "string", "description": "The content of the message to respond to."}
},
"required": ["message"]
}
}
]
response = llm.completion(messages, tools=tools)
print(response)
~~~
#### Option 2: Using vLLM’s built-in tool calling
Hammer2.1 supports vllm’s built-in tool calling. This functionality requires vllm>=0.6. If you want to enable this functionality, please start vllm’s OpenAI-compatible service with:
~~~
vllm serve MadeAgents/Hammer2.1-0.5b --enable-auto-tool-choice --tool-call-parser hermes
~~~
And then use it in the same way you use GPT’s tool calling:
~~~
tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
"default": "celsius"
},
},
"required": ["location","format"],
},
}
},
{
"type": "function",
"function": {
"name": "get_n_day_weather_forecast",
"description": "Get an N-day weather forecast",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
"default": "celsius"
},
"num_days": {
"type": "integer",
"description": "The number of days to forecast",
"default": 1
}
},
"required": ["location", "format", "num_days"]
},
}
},
]
from openai import OpenAI
openai_api_key = "None"
openai_api_base = "http://localhost:8000/v1"
client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base,
)
query = """What's the weather like today in San Francisco"""
chat_response = client.chat.completions.create(
model="MadeAgents/Hammer2.1-0.5b",
messages=[
{"role": "user", "content": query},],
tools = tools,
temperature=0
)
print(chat_response.choices[0].message.content)
~~~
### Using Hugging Face Transformers
Hammer2.1’s chat template also includes a tool calling template, meaning that you can use Hugging Face transformers’ tool calling support. This is a simple example of how to use our model using Transformers.
~~~
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("MadeAgents/Hammer2.1-0.5b")
model = AutoModelForCausalLM.from_pretrained("MadeAgents/Hammer2.1-0.5b", torch_dtype=torch.bfloat16, device_map="auto")
# Example conversation
messages = [
{"role": "user", "content": "What's the weather like in New York?"},
{"role": "assistant","content": '```\n{"name": "get_weather", "arguments": {"location": "New York, NY ", "unit": "celsius"}\n```'},
{"role": "tool", "name": "get_weather", "content": '{"temperature": 72, "description": "Partly cloudy"}'},
{"role": "user", "content": "Now, search for the weather in San Francisco."}
]
# Example function definition (optional)
tools = [
{
"name": "get_weather",
"description": "Get the current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "The unit of temperature to return"}
},
"required": ["location"]
}
},
{
"name": "respond",
"description": "When you are ready to respond, use this function. This function allows the assistant to formulate and deliver appropriate replies based on the input message and the context of the conversation. Generate a concise response for simple questions, and a more detailed response for complex questions.",
"parameters": {
"type": "object",
"properties": {
"message": {"type": "string", "description": "The content of the message to respond to."}
},
"required": ["message"]
}
}
]
inputs = tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, return_dict=True, return_tensors="pt")
inputs = {k: v.to(model.device) for k, v in inputs.items()}
out = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(out[0][len(inputs["input_ids"][0]):], skip_special_tokens=True))
~~~ |