Safetensors
qwen2
Guanyu419 commited on
Commit
bc1db5b
1 Parent(s): 70cc39c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -19
README.md CHANGED
@@ -29,25 +29,9 @@ The evaluation result of Hammer2.0 series on the Berkeley Function-Calling Leade
29
 
30
 
31
  In addition, we evaluated Hammer2.0 on other academic benchmarks to further show our model's generalization ability:
32
-
33
- | Model | Size | Func-Name+Args Det. (F1 Func-Name \| F1 Args) | | | | | | | | | | F1 Average | |
34
- |:---------------------------:|:----:|:---------------------------------------------:|:-----:|:------------:|:-----:|:-----------:|:-----:|:---------------------:|:-----:|:-----------:|:-----:|:----------:|:-----:|
35
- | | | API-Bank L-1 | | API-Bank L-2 | | Tool-Alpaca | | SealTool(Single-Tool) | | Nexus Raven | | Func Name | Args |
36
- | GPT-4o-mini (Prompt) | -- | 95.1% | 89.3% | 84.3% | 67.5% | 64.3% | 54.7% | 87.9% | 86.0% | 91.7% | 84.6% | 84.7% | 76.4% |
37
- | qwen2-7b-instruct | 7B | 81.5% | 60.6% | 95.7% | 49.5% | 71.6% | 48.1% | 93.9% | 77.5% | 87.1% | 63.5% | 85.9% | 59.8% |
38
- | qwen1.5-4b-Chat | 4B | 55.3% | 59.8% | 46.7% | 38.5% | 35.4% | 17.0% | 48.4% | 62.3% | 29.0% | 33.7% | 43.0% | 42.2% |
39
- | qwen2-1.5b-instruct | 1.5B | 74.6% | 63.6% | 57.7% | 33.6% | 65.8% | 45.2% | 82.1% | 75.5% | 70.6% | 45.5% | 70.2% | 52.7% |
40
- | Gorilla-openfunctions-v2 | 7B | 69.2% | 70.3% | 48.8% | 54.7% | 72.9% | 51.3% | 93.2% | 91.1% | 72.8% | 68.4% | 71.4% | 67.2% |
41
- | GRANITE-20B-FUNCTIONCALLING | 20B | 90.4% | 77.8% | 78.9% | 59.2% | 77.3% | 58.0% | 94.9% | 92.7% | 94.5% | 75.1% | 87.2% | 72.6% |
42
- | xlam-7b-fc-r | 7B | 90.0% | 80.7% | 72.5% | 64.2% | 67.3% | 59.0% | 79.0% | 76.9% | 54.1% | 57.5% | 72.6% | 67.7% |
43
- | xlam-1b-fc-r | 1.3B | 94.9% | 83.7% | 91.8% | 64.3% | 64.9% | 50.6% | 90.7% | 80.4% | 64.4% | 54.8% | 81.3% | 66.8% |
44
- | Hammer-7b | 7B | 93.5% | 85.8% | 82.9% | 66.4% | 82.3% | 59.9% | 97.4% | 91.7% | 92.5% | 77.4% | 89.7% | 76.2% |
45
- | Hammer-4b | 4B | 91.6% | 81.5% | 77.6% | 61.0% | 85.1% | 57.0% | 96.4% | 92.4% | 81.7% | 64.9% | 86.5% | 71.4% |
46
- | Hammer-1.5b | 1.5B | 82.1% | 72.3% | 79.8% | 59.7% | 80.9% | 53.5% | 95.6% | 88.6% | 79.9% | 56.9% | 83.7% | 66.2% |
47
- | Hammer2.0-0.5B | 0.5B | 81.2% | 67.8% | 62.9% | 52.0% | 79.1% | 50.9% | 94.9% | 83.8% | 74.7% | 49.0% | 78.5% | 60.7% |
48
- | Hammer2.0-1.5B | 1.5B | 90.2% | 80.4% | 82.9% | 63.8% | 86.2% | 59.5% | 97.5% | 92.5% | 86.4% | 65.5% | 88.6% | 72.4% |
49
- | Hammer2.0-3B | 3B | 93.6% | 84.3% | 83.7% | 59.0% | 83.1% | 58.8% | 95.3% | 91.2% | 92.5% | 70.5% | 89.6% | 72.8% |
50
- | Hammer2.0-7B | 7B | 91.0% | 82.1% | 82.5% | 65.1% | 85.2% | 59.6% | 96.8% | 92.7% | 93.0% | 80.5% | 89.7% | 76.0% |
51
 
52
  On comparison, Hammer 2.0 outperform models with similar sizes and even surpass many larger models overall.
53
 
 
29
 
30
 
31
  In addition, we evaluated Hammer2.0 on other academic benchmarks to further show our model's generalization ability:
32
+ <div style="text-align: center;">
33
+ <img src="v2_figures/others.PNG" alt="overview" width="1000" style="margin: auto;">
34
+ </div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
 
36
  On comparison, Hammer 2.0 outperform models with similar sizes and even surpass many larger models overall.
37