Safetensors
qwen2
File size: 10,639 Bytes
22392a3
 
 
 
 
 
 
 
 
 
 
c380195
f6d46ae
22392a3
b00b5e9
e566774
ef9a3e6
 
22392a3
 
 
 
 
bb07fcc
22392a3
 
bcc0e29
22392a3
 
 
 
bb07fcc
22392a3
 
 
 
 
 
 
 
f58034e
22392a3
a7b1a16
9c80bb9
 
 
5f912e6
678f51c
5f912e6
 
 
 
 
 
9c80bb9
 
5f912e6
9c80bb9
 
 
 
5f912e6
9c80bb9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5f912e6
9c80bb9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5f912e6
9c80bb9
 
 
 
 
 
 
 
 
 
 
 
22392a3
 
 
 
5f912e6
 
22392a3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
---
license: cc-by-nc-4.0
datasets:
- Salesforce/xlam-function-calling-60k
- MadeAgents/xlam-irrelevance-7.5k
base_model:
- Qwen/Qwen2.5-Coder-0.5B-Instruct
---
# Hammer2.1-0.5b Function Calling Model

## Introduction

Hammer refers to a series of lightweight Large Action Models. Currently, we are releasing Hammer 2.1 models ([0.5B](https://huggingface.co/MadeAgents/Hammer2.1-0.5b), [1.5B](https://huggingface.co/MadeAgents/Hammer2.1-1.5b), [3B](https://huggingface.co/MadeAgents/Hammer2.1-3b), and [7B](https://huggingface.co/MadeAgents/Hammer2.1-7b)) with strong function calling capability. These models are based on the Qwen 2.5 coder series and utilize [function masking techniques](https://arxiv.org/abs/2410.04587) and other advanced technologies. Hammer 2.1 series bring significant enhancements, while still maintaining the basic functionality of Hammer 2.0's Single-Turn interaction and further strengthening other capabilities.

## Model Details
The Hammer 2.1 models, fine-tuned from the Qwen 2.5 coder series, inherit Hammer 2.0's advantages and are enhanced as follows:
- <span style="color: red;">Multi-Step Function Calling:</span> The assistant can perform multiple internal function calls to handle a single user request, actively planning and gathering information to fulfill complex tasks.
- <span style="color: red;">Multi-Turn Function Calling:</span> Enables continuous and context-aware interactions over multiple exchanges, with each turn potentially containing multiple steps, for a more natural conversation experience.
- Enhanced Irrelevant Information Inspection: Better at identifying when provided functions are irrelevant to a user query, by providing a non-function call response.

## Evaluation
The evaluation results of Hammer 2.1 models on the Berkeley Function-Calling Leaderboard (BFCL-v3) are presented in the following table:
<div style="text-align: center;">
    <img src="v2_figures/bfcl.png" alt="overview" width="1000" style="margin: auto;">
</div>

Our Hammer 2.1 series consistently achieves corresponding best performance at comparable scales. The 7B/3B/1.5B model outperform most function calling enchanced models.

In addition, we evaluated the Hammer 2.1 models on other academic benchmarks to further demonstrate the generalization ability of our models.

<div style="text-align: center;">
    <img src="v2_figures/others-v2.png" alt="overview" width="1000" style="margin: auto;">
</div>

Hammer 2.1 models showcase highly stable performance, suggesting the robustness of Hammer 2.1 series. In contrast, the baseline approaches display varying levels of effectiveness.




## Requiements
The code of Hammer 2.1 models have been in the latest Hugging face transformers and we advise you to install `transformers>=4.47.0`.

## How to Use
Hammer models offer flexibility in deployment and usage, fully supporting both **vLLM** deployment and **Hugging Face Transformers** tool calling. Below are the specifics on how to make use of these features:

### Using vLLM
#### Option 1: Using Hammer client (Recommended)

Before using vLLM, first clone the Hammer code repository and change directory to the 'Hammer':
```
git clone https://github.com/MadeAgents/Hammer.git
cd Hammer
```

vLLM offers efficient serving with lower latency. To serve the model with vLLM:
```
vllm serve MadeAgents/Hammer2.1-0.5b --host 0.0.0.0 --port 8000 --tensor-parallel-size 1
```
Once the model is served, you can use the following Hammer client to interact with it for function calling:
~~~
from client import HammerChatCompletion,HammerConfig
config = HammerConfig(base_url="http://localhost:8000/v1/", model="MadeAgents/Hammer2.1-0.5b")
llm = HammerChatCompletion.from_config(config)

# Example conversation
messages = [
    {"role": "user", "content": "What's the weather like in New York?"},
    {"role": "assistant","content": '```\n{"name": "get_weather", "arguments": {"location": "New York, NY ", "unit": "celsius"}\n```'},
    {"role": "tool", "name": "get_weather", "content": '{"temperature": 72, "description": "Partly cloudy"}'},
    {"role": "user", "content": "Now, search for the weather in San Francisco."}
]

# Example function definition (optional)
tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "The unit of temperature to return"}
            },
            "required": ["location"]
        }
    },
    {
        "name": "respond",
        "description": "When you are ready to respond, use this function. This function allows the assistant to formulate and deliver appropriate replies based on the input message and the context of the conversation. Generate a concise response for simple questions, and a more detailed response for complex questions.",
        "parameters": {
            "type": "object",
            "properties": {
                "message": {"type": "string", "description": "The content of the message to respond to."}
            },
            "required": ["message"]
        }
    }
]

response = llm.completion(messages, tools=tools)
print(response)
~~~


#### Option 2: Using vLLM’s built-in tool calling
Hammer2.1 supports vllm’s built-in tool calling. This functionality requires vllm>=0.6. If you want to enable this functionality, please start vllm’s OpenAI-compatible service with:
~~~
vllm serve MadeAgents/Hammer2.1-0.5b --enable-auto-tool-choice --tool-call-parser hermes
~~~
And then use it in the same way you use GPT’s tool calling:
~~~
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA",
                    },
                    "format": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "The temperature unit to use. Infer this from the users location.",
                        "default": "celsius"
                    },
                },
                "required": ["location","format"],
            },
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_n_day_weather_forecast",
            "description": "Get an N-day weather forecast",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA",
                    },
                    "format": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "The temperature unit to use. Infer this from the users location.",
                        "default": "celsius"
                    },
                    "num_days": {
                        "type": "integer",
                        "description": "The number of days to forecast",
                        "default": 1
                    }
                },
                "required": ["location", "format", "num_days"]
            },
        }
    },
]


from openai import OpenAI
openai_api_key = "None"
openai_api_base = "http://localhost:8000/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

query = """What's the weather like today in San Francisco"""

chat_response = client.chat.completions.create(
    model="MadeAgents/Hammer2.1-0.5b",
    messages=[
        {"role": "user", "content": query},],
    tools = tools,
    temperature=0
)
print(chat_response.choices[0].message.content)
~~~


### Using Hugging Face Transformers
Hammer2.1’s chat template also includes a tool calling template, meaning that you can use Hugging Face transformers’ tool calling support. This is a simple example of how to use our model using Transformers.
~~~
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer


tokenizer = AutoTokenizer.from_pretrained("MadeAgents/Hammer2.1-0.5b")
model = AutoModelForCausalLM.from_pretrained("MadeAgents/Hammer2.1-0.5b", torch_dtype=torch.bfloat16, device_map="auto")

# Example conversation
messages = [
    {"role": "user", "content": "What's the weather like in New York?"},
    {"role": "assistant","content": '```\n{"name": "get_weather", "arguments": {"location": "New York, NY ", "unit": "celsius"}\n```'},
    {"role": "tool", "name": "get_weather", "content": '{"temperature": 72, "description": "Partly cloudy"}'},
    {"role": "user", "content": "Now, search for the weather in San Francisco."}
]

# Example function definition (optional)
tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "The unit of temperature to return"}
            },
            "required": ["location"]
        }
    },
    {
        "name": "respond",
        "description": "When you are ready to respond, use this function. This function allows the assistant to formulate and deliver appropriate replies based on the input message and the context of the conversation. Generate a concise response for simple questions, and a more detailed response for complex questions.",
        "parameters": {
            "type": "object",
            "properties": {
                "message": {"type": "string", "description": "The content of the message to respond to."}
            },
            "required": ["message"]
        }
    }
]

inputs = tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, return_dict=True, return_tensors="pt")
inputs = {k: v.to(model.device) for k, v in inputs.items()}
out = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(out[0][len(inputs["input_ids"][0]):], skip_special_tokens=True))
~~~