Update README.md
Browse files
README.md
CHANGED
@@ -20,7 +20,7 @@ The Hammer 2.5 models, fine-tuned from the Qwen 2.5 coder series, inherit Hammer
|
|
20 |
## Evaluation
|
21 |
The evaluation results of Hammer 2.1 models on the Berkeley Function-Calling Leaderboard (BFCL-v3) are presented in the following table:
|
22 |
<div style="text-align: center;">
|
23 |
-
<img src="v2_figures/bfcl.
|
24 |
</div>
|
25 |
|
26 |
Our Hammer 2.1 series consistently achieves corresponding best performance at comparable scales. The 7B/3B/1.5B model outperforms most function calling enchanced models.
|
@@ -28,7 +28,7 @@ Our Hammer 2.1 series consistently achieves corresponding best performance at co
|
|
28 |
In addition, we evaluated the Hammer 2.1 models on other academic benchmarks to further demonstrate the generalization ability of our models.
|
29 |
|
30 |
<div style="text-align: center;">
|
31 |
-
<img src="v2_figures/others-v2.
|
32 |
</div>
|
33 |
|
34 |
Hammer 2.1 models showcase highly stable performance, suggesting the robustness of Hammer 2.1 series. In contrast, the baseline approaches display varying levels of effectiveness.
|
|
|
20 |
## Evaluation
|
21 |
The evaluation results of Hammer 2.1 models on the Berkeley Function-Calling Leaderboard (BFCL-v3) are presented in the following table:
|
22 |
<div style="text-align: center;">
|
23 |
+
<img src="v2_figures/bfcl.png" alt="overview" width="1000" style="margin: auto;">
|
24 |
</div>
|
25 |
|
26 |
Our Hammer 2.1 series consistently achieves corresponding best performance at comparable scales. The 7B/3B/1.5B model outperforms most function calling enchanced models.
|
|
|
28 |
In addition, we evaluated the Hammer 2.1 models on other academic benchmarks to further demonstrate the generalization ability of our models.
|
29 |
|
30 |
<div style="text-align: center;">
|
31 |
+
<img src="v2_figures/others-v2.png" alt="overview" width="1000" style="margin: auto;">
|
32 |
</div>
|
33 |
|
34 |
Hammer 2.1 models showcase highly stable performance, suggesting the robustness of Hammer 2.1 series. In contrast, the baseline approaches display varying levels of effectiveness.
|