MadeAgents
/

Hammer2.0-7b

Model card Files Files and versions Community

qypeng commited on Oct 9, 2024

Commit

41d40b5

·

verified ·

1 Parent(s): 8010f9c

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -14,12 +14,12 @@ We're excited to release lightweight Hammer 2.0 models ([0.5B](https://huggingfa
 Hammer2.0 finetuned based on [Qwen 2.5 series](https://huggingface.co/collections/Qwen/qwen25-66e81a666513e518adb90d9e) and [Qwen 2.5 coder series](https://huggingface.co/collections/Qwen/qwen25-coder-66eaa22e6f99801bf65b0c2f) using function masking techniques. It's trained using the [APIGen Function Calling Datasets](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) containing 60,000 samples, supplemented by [xlam-irrelevance-7.5k](https://huggingface.co/datasets/MadeAgents/xlam-irrelevance-7.5k) we generated. Hammer2.0 has achieved exceptional performances across numerous function calling benchmarks. For detailed data construction, training methods, and evaluation strategies, please refer to our paper [Hammer: Robust Function-Calling for On-Device Language Models via Function Masking](https://arxiv.org/abs/2410.04587) and the [Hammer GitHub repository](https://github.com/MadeAgents/Hammer) .
 ## Evaluation
-The evaluation results of Hammer 2.0 models on the Berkeley Function-Calling Leaderboard (BFCL) are presented in the following table:
 <div style="text-align: center;">
     <img src="v2_figures/bfcl.PNG" alt="overview" width="1000" style="margin: auto;">
 </div>
-Our Hammer 2.0 series consistently achieves corresponding best performance at comparable scales. The 7B model outperforms most function calling enchanced models, while the 1B is competitive with Gemma.
 In addition, we evaluated the Hammer 2.0 models on other academic benchmarks to further demonstrate the generalization ability of our models.

 Hammer2.0 finetuned based on [Qwen 2.5 series](https://huggingface.co/collections/Qwen/qwen25-66e81a666513e518adb90d9e) and [Qwen 2.5 coder series](https://huggingface.co/collections/Qwen/qwen25-coder-66eaa22e6f99801bf65b0c2f) using function masking techniques. It's trained using the [APIGen Function Calling Datasets](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) containing 60,000 samples, supplemented by [xlam-irrelevance-7.5k](https://huggingface.co/datasets/MadeAgents/xlam-irrelevance-7.5k) we generated. Hammer2.0 has achieved exceptional performances across numerous function calling benchmarks. For detailed data construction, training methods, and evaluation strategies, please refer to our paper [Hammer: Robust Function-Calling for On-Device Language Models via Function Masking](https://arxiv.org/abs/2410.04587) and the [Hammer GitHub repository](https://github.com/MadeAgents/Hammer) .
 ## Evaluation
+The evaluation results of Hammer 2.0 models on the Berkeley Function-Calling Leaderboard (BFCL-v3) are presented in the following table:
 <div style="text-align: center;">
     <img src="v2_figures/bfcl.PNG" alt="overview" width="1000" style="margin: auto;">
 </div>
+Our Hammer 2.0 series consistently achieves corresponding best performance at comparable scales. The 7B model outperforms most function calling enchanced models, and the 1.5B model also achieves unexpected performance.
 In addition, we evaluated the Hammer 2.0 models on other academic benchmarks to further demonstrate the generalization ability of our models.