Safetensors
qwen2
qypeng commited on
Commit
41d40b5
·
verified ·
1 Parent(s): 8010f9c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -14,12 +14,12 @@ We're excited to release lightweight Hammer 2.0 models ([0.5B](https://huggingfa
14
  Hammer2.0 finetuned based on [Qwen 2.5 series](https://huggingface.co/collections/Qwen/qwen25-66e81a666513e518adb90d9e) and [Qwen 2.5 coder series](https://huggingface.co/collections/Qwen/qwen25-coder-66eaa22e6f99801bf65b0c2f) using function masking techniques. It's trained using the [APIGen Function Calling Datasets](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) containing 60,000 samples, supplemented by [xlam-irrelevance-7.5k](https://huggingface.co/datasets/MadeAgents/xlam-irrelevance-7.5k) we generated. Hammer2.0 has achieved exceptional performances across numerous function calling benchmarks. For detailed data construction, training methods, and evaluation strategies, please refer to our paper [Hammer: Robust Function-Calling for On-Device Language Models via Function Masking](https://arxiv.org/abs/2410.04587) and the [Hammer GitHub repository](https://github.com/MadeAgents/Hammer) .
15
 
16
  ## Evaluation
17
- The evaluation results of Hammer 2.0 models on the Berkeley Function-Calling Leaderboard (BFCL) are presented in the following table:
18
  <div style="text-align: center;">
19
  <img src="v2_figures/bfcl.PNG" alt="overview" width="1000" style="margin: auto;">
20
  </div>
21
 
22
- Our Hammer 2.0 series consistently achieves corresponding best performance at comparable scales. The 7B model outperforms most function calling enchanced models, while the 1B is competitive with Gemma.
23
 
24
  In addition, we evaluated the Hammer 2.0 models on other academic benchmarks to further demonstrate the generalization ability of our models.
25
 
 
14
  Hammer2.0 finetuned based on [Qwen 2.5 series](https://huggingface.co/collections/Qwen/qwen25-66e81a666513e518adb90d9e) and [Qwen 2.5 coder series](https://huggingface.co/collections/Qwen/qwen25-coder-66eaa22e6f99801bf65b0c2f) using function masking techniques. It's trained using the [APIGen Function Calling Datasets](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) containing 60,000 samples, supplemented by [xlam-irrelevance-7.5k](https://huggingface.co/datasets/MadeAgents/xlam-irrelevance-7.5k) we generated. Hammer2.0 has achieved exceptional performances across numerous function calling benchmarks. For detailed data construction, training methods, and evaluation strategies, please refer to our paper [Hammer: Robust Function-Calling for On-Device Language Models via Function Masking](https://arxiv.org/abs/2410.04587) and the [Hammer GitHub repository](https://github.com/MadeAgents/Hammer) .
15
 
16
  ## Evaluation
17
+ The evaluation results of Hammer 2.0 models on the Berkeley Function-Calling Leaderboard (BFCL-v3) are presented in the following table:
18
  <div style="text-align: center;">
19
  <img src="v2_figures/bfcl.PNG" alt="overview" width="1000" style="margin: auto;">
20
  </div>
21
 
22
+ Our Hammer 2.0 series consistently achieves corresponding best performance at comparable scales. The 7B model outperforms most function calling enchanced models, and the 1.5B model also achieves unexpected performance.
23
 
24
  In addition, we evaluated the Hammer 2.0 models on other academic benchmarks to further demonstrate the generalization ability of our models.
25