Safetensors
qwen2
qypeng commited on
Commit
8010f9c
·
verified ·
1 Parent(s): 24e1a34

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -6
README.md CHANGED
@@ -2,7 +2,7 @@
2
  license: cc-by-4.0
3
  datasets:
4
  - Salesforce/xlam-function-calling-60k
5
- - MadeAgents/XLAM-7.5k-Irrelevance
6
  base_model:
7
  - Qwen/Qwen2.5-Coder-7B-Instruct
8
  ---
@@ -11,24 +11,26 @@ base_model:
11
  We're excited to release lightweight Hammer 2.0 models ([0.5B](https://huggingface.co/MadeAgents/Hammer2.0-0.5b) , [1.5B](https://huggingface.co/MadeAgents/Hammer2.0-1.5b) , [3B](https://huggingface.co/MadeAgents/Hammer2.0-3b) , and [7B](https://huggingface.co/MadeAgents/Hammer2.0-7b)) with strong function calling capability, which empower developers to build personalized, on-device agentic applications.
12
 
13
  ## Model Details
14
- Hammer2.0 finetuned based on [Qwen 2.5 series](https://huggingface.co/collections/Qwen/qwen25-66e81a666513e518adb90d9e) and [Qwen 2.5 coder series](https://huggingface.co/collections/Qwen/qwen25-coder-66eaa22e6f99801bf65b0c2f) using function masking techniques. It's trained using the [APIGen Function Calling Datasets](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) containing 60,000 samples, supplemented by [XLAM-7.5k-Irrelevance](https://huggingface.co/datasets/MadeAgents/XLAM-7.5k-Irrelevance) we generated. Hammer2.0 has achieved exceptional performances across numerous function calling benchmarks. For detailed data construction, training methods, and evaluation strategies, please refer to our paper [Hammer: Robust Function-Calling for On-Device Language Models via Function Masking](https://arxiv.org/abs/2410.04587) and the [Hammer GitHub repository](https://github.com/MadeAgents/Hammer) .
15
 
16
  ## Evaluation
17
- The evaluation results of Hammer 2.0 series on the Berkeley Function-Calling Leaderboard (BFCL) are presented in the following table:
18
  <div style="text-align: center;">
19
  <img src="v2_figures/bfcl.PNG" alt="overview" width="1000" style="margin: auto;">
20
  </div>
21
 
 
 
 
22
 
23
- In addition, we evaluated Hammer2.0 on other academic benchmarks to further show our model's generalization ability:
24
  <div style="text-align: center;">
25
  <img src="v2_figures/others.PNG" alt="overview" width="1000" style="margin: auto;">
26
  </div>
27
 
28
- On comparison, Hammer 2.0 outperforms models with similar sizes and even surpass many larger models overall.
29
 
30
  ## Requiements
31
- The code of Hammer2.0 models have been in the latest Hugging face transformers and we advise you to install `transformers>=4.37.0`.
32
 
33
  ## How to Use
34
  This is a simple example of how to use our model.
 
2
  license: cc-by-4.0
3
  datasets:
4
  - Salesforce/xlam-function-calling-60k
5
+ - MadeAgents/xlam-irrelevance-7.5k
6
  base_model:
7
  - Qwen/Qwen2.5-Coder-7B-Instruct
8
  ---
 
11
  We're excited to release lightweight Hammer 2.0 models ([0.5B](https://huggingface.co/MadeAgents/Hammer2.0-0.5b) , [1.5B](https://huggingface.co/MadeAgents/Hammer2.0-1.5b) , [3B](https://huggingface.co/MadeAgents/Hammer2.0-3b) , and [7B](https://huggingface.co/MadeAgents/Hammer2.0-7b)) with strong function calling capability, which empower developers to build personalized, on-device agentic applications.
12
 
13
  ## Model Details
14
+ Hammer2.0 finetuned based on [Qwen 2.5 series](https://huggingface.co/collections/Qwen/qwen25-66e81a666513e518adb90d9e) and [Qwen 2.5 coder series](https://huggingface.co/collections/Qwen/qwen25-coder-66eaa22e6f99801bf65b0c2f) using function masking techniques. It's trained using the [APIGen Function Calling Datasets](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) containing 60,000 samples, supplemented by [xlam-irrelevance-7.5k](https://huggingface.co/datasets/MadeAgents/xlam-irrelevance-7.5k) we generated. Hammer2.0 has achieved exceptional performances across numerous function calling benchmarks. For detailed data construction, training methods, and evaluation strategies, please refer to our paper [Hammer: Robust Function-Calling for On-Device Language Models via Function Masking](https://arxiv.org/abs/2410.04587) and the [Hammer GitHub repository](https://github.com/MadeAgents/Hammer) .
15
 
16
  ## Evaluation
17
+ The evaluation results of Hammer 2.0 models on the Berkeley Function-Calling Leaderboard (BFCL) are presented in the following table:
18
  <div style="text-align: center;">
19
  <img src="v2_figures/bfcl.PNG" alt="overview" width="1000" style="margin: auto;">
20
  </div>
21
 
22
+ Our Hammer 2.0 series consistently achieves corresponding best performance at comparable scales. The 7B model outperforms most function calling enchanced models, while the 1B is competitive with Gemma.
23
+
24
+ In addition, we evaluated the Hammer 2.0 models on other academic benchmarks to further demonstrate the generalization ability of our models.
25
 
 
26
  <div style="text-align: center;">
27
  <img src="v2_figures/others.PNG" alt="overview" width="1000" style="margin: auto;">
28
  </div>
29
 
30
+ Hammer 2.0 models showcase highly stable performance, suggesting the robustness of Hammer 2.0 series. In contrast, the baseline approaches display varying levels of effectiveness on these other benchmarks.
31
 
32
  ## Requiements
33
+ The code of Hammer 2.0 models have been in the latest Hugging face transformers and we advise you to install `transformers>=4.37.0`.
34
 
35
  ## How to Use
36
  This is a simple example of how to use our model.