munish0838 commited on
Commit
6a87c45
1 Parent(s): 014fe2e

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +63 -62
README.md CHANGED
@@ -71,76 +71,80 @@ Hermes 3 is competitive, if not superior, to Llama-3.1 Instruct models at genera
71
 
72
  ## GPT4All:
73
  ```
74
- |    Task     |Version| Metric |Value |   |Stderr|
75
- |-------------|------:|--------|-----:|---|-----:|
76
- |arc_challenge|      0|acc     |0.5529  |0.0145|
77
- |             |       |acc_norm|0.5870  |0.0144|
78
- |arc_easy     |      0|acc     |0.8371  |0.0076|
79
- |             |       |acc_norm|0.8144  |0.0080|
80
- |boolq        |      1|acc     |0.8599  |0.0061|
81
- |hellaswag    |      0|acc     |0.6133  |0.0049|
82
- |             |       |acc_norm|0.7989  |0.0040|
83
- |openbookqa   |      0|acc     |0.3940  |0.0219|
84
- |             |       |acc_norm|0.4680  |0.0223|
85
- |piqa         |      0|acc     |0.8063  |0.0092|
86
- |             |       |acc_norm|0.8156  |0.0090|
87
- |winogrande   |      0|acc     |0.7372  |0.0124|
88
  ```
89
 
90
- Average: 72.59
91
 
92
  ## AGIEval:
93
  ```
94
- |             Task             |Version| Metric |Value |   |Stderr|
95
- |------------------------------|------:|--------|-----:|---|-----:|
96
- |agieval_aqua_rat              |      0|acc     |0.2441  |0.0270|
97
- |                              |       |acc_norm|0.2441|±  |0.0270|
98
- |agieval_logiqa_en             |      0|acc     |0.3687  |0.0189|
99
- |                              |       |acc_norm|0.3840  |0.0191|
100
- |agieval_lsat_ar               |      0|acc     |0.2304|±  |0.0278|
101
- |                              |       |acc_norm|0.2174  |0.0273|
102
- |agieval_lsat_lr               |      0|acc     |0.5471  |0.0221|
103
- |                              |       |acc_norm|0.5373  |0.0221|
104
- |agieval_lsat_rc               |      0|acc     |0.6617  |0.0289|
105
- |                              |       |acc_norm|0.6357  |0.0294|
106
- |agieval_sat_en                |      0|acc     |0.7670  |0.0295|
107
- |                              |       |acc_norm|0.7379  |0.0307|
108
- |agieval_sat_en_without_passage|      0|acc     |0.4417  |0.0347|
109
- |                              |       |acc_norm|0.4223  |0.0345|
110
- |agieval_sat_math              |      0|acc     |0.4000  |0.0331|
111
- |                              |       |acc_norm|0.3455  |0.0321|
112
  ```
113
 
114
- Average: 44.05
115
 
116
  ## BigBench:
117
 
118
  ```
119
-
120
- |                      Task                      |Version|       Metric        |Value |   |Stderr|
121
- |------------------------------------------------|------:|---------------------|-----:|---|-----:|
122
- |bigbench_causal_judgement                       |      0|multiple_choice_grade|0.6000  |0.0356|
123
- |bigbench_date_understanding                     |      0|multiple_choice_grade|0.6585  |0.0247|
124
- |bigbench_disambiguation_qa                      |      0|multiple_choice_grade|0.3178  |0.0290|
125
- |bigbench_geometric_shapes                       |      0|multiple_choice_grade|0.2340  |0.0224|
126
- |                                                |       |exact_str_match      |0.0000  |0.0000|
127
- |bigbench_logical_deduction_five_objects         |      0|multiple_choice_grade|0.2980  |0.0205|
128
- |bigbench_logical_deduction_seven_objects        |      0|multiple_choice_grade|0.2057  |0.0153|
129
- |bigbench_logical_deduction_three_objects        |      0|multiple_choice_grade|0.5367  |0.0288|
130
- |bigbench_movie_recommendation                   |      0|multiple_choice_grade|0.4040  |0.0220|
131
- |bigbench_navigate                               |      0|multiple_choice_grade|0.4970  |0.0158|
132
- |bigbench_reasoning_about_colored_objects        |      0|multiple_choice_grade|0.7075  |0.0102|
133
- |bigbench_ruin_names                             |      0|multiple_choice_grade|0.4821  |0.0236|
134
- |bigbench_salient_translation_error_detection    |      0|multiple_choice_grade|0.2295  |0.0133|
135
- |bigbench_snarks                                 |      0|multiple_choice_grade|0.6906  |0.0345|
136
- |bigbench_sports_understanding                   |      0|multiple_choice_grade|0.5375  |0.0159|
137
- |bigbench_temporal_sequences                     |      0|multiple_choice_grade|0.6270  |0.0153|
138
- |bigbench_tracking_shuffled_objects_five_objects |      0|multiple_choice_grade|0.2216  |0.0118|
139
- |bigbench_tracking_shuffled_objects_seven_objects|      0|multiple_choice_grade|0.1594  |0.0088|
140
- |bigbench_tracking_shuffled_objects_three_objects|      0|multiple_choice_grade|0.5367  |0.0288|
 
 
 
 
141
  ```
142
 
143
- Average: 44.13
144
 
145
 
146
  # Prompt Format
@@ -183,7 +187,7 @@ To utilize the prompt format without a system prompt, simply leave the line out.
183
 
184
  ## Prompt Format for Function Calling
185
 
186
- # Note: This version uses USER as both the user prompt and the tool response role. This is due to a bug we experienced when training. It will require modification to the function calling code!
187
 
188
  Our model was trained on specific system prompts and structures for Function Calling.
189
 
@@ -212,7 +216,7 @@ The model will then generate a tool call, which your inference code must parse,
212
 
213
  Once you parse the tool call, call the api and get the returned values for the call, and pass it back in as a new role, `tool` like so:
214
  ```
215
- <|im_start|>user
216
  <tool_response>
217
  {"name": "get_stock_fundamentals", "content": {'symbol': 'TSLA', 'company_name': 'Tesla, Inc.', 'sector': 'Consumer Cyclical', 'industry': 'Auto Manufacturers', 'market_cap': 611384164352, 'pe_ratio': 49.604652, 'pb_ratio': 9.762013, 'dividend_yield': None, 'eps': 4.3, 'beta': 2.427, '52_week_high': 299.29, '52_week_low': 152.37}}
218
  </tool_response>
@@ -318,6 +322,3 @@ GGUF Quants: https://huggingface.co/NousResearch/Hermes-3-Llama-3.2-3B-GGUF
318
  url={https://arxiv.org/abs/2408.11857},
319
  }
320
  ```
321
-
322
-
323
-
 
71
 
72
  ## GPT4All:
73
  ```
74
+ | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
75
+ |-------------|------:|------|-----:|--------|---|-----:|---|-----:|
76
+ |arc_challenge| 1|none | 0|acc |↑ |0.4411 |0.0145|
77
+ | | |none | 0|acc_norm|↑ |0.4377 |0.0145|
78
+ |arc_easy | 1|none | 0|acc |↑ |0.7399 |0.0090|
79
+ | | |none | 0|acc_norm|↑ |0.6566 |0.0097|
80
+ |boolq | 2|none | 0|acc |↑ |0.8327 |0.0065|
81
+ |hellaswag | 1|none | 0|acc |↑ |0.5453 |0.0050|
82
+ | | |none | 0|acc_norm|↑ |0.7047 |0.0046|
83
+ |openbookqa | 1|none | 0|acc |↑ |0.3480 |0.0213|
84
+ | | |none | 0|acc_norm|↑ |0.4280 |0.0221|
85
+ |piqa | 1|none | 0|acc |↑ |0.7639 |0.0099|
86
+ | | |none | 0|acc_norm|↑ |0.7584 |0.0100|
87
+ |winogrande | 1|none | 0|acc |↑ |0.6590 |0.0133|
88
  ```
89
 
90
+ Average: 64.00
91
 
92
  ## AGIEval:
93
  ```
94
+ | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
95
+ |------------------------------|------:|------|-----:|--------|---|-----:|---|-----:|
96
+ |agieval_aqua_rat | 1|none | 0|acc |↑ |0.2283 |0.0264|
97
+ | | |none | 0|acc_norm|↑ |0.2441|± |0.0270|
98
+ |agieval_logiqa_en | 1|none | 0|acc |↑ |0.3057 |0.0181|
99
+ | | |none | 0|acc_norm|↑ |0.3272 |0.0184|
100
+ |agieval_lsat_ar | 1|none | 0|acc |↑ |0.2304|± |0.0278|
101
+ | | |none | 0|acc_norm|↑ |0.1957 |0.0262|
102
+ |agieval_lsat_lr | 1|none | 0|acc |↑ |0.3784 |0.0215|
103
+ | | |none | 0|acc_norm|↑ |0.3588 |0.0213|
104
+ |agieval_lsat_rc | 1|none | 0|acc |↑ |0.4610 |0.0304|
105
+ | | |none | 0|acc_norm|↑ |0.4275 |0.0302|
106
+ |agieval_sat_en | 1|none | 0|acc |↑ |0.6019 |0.0342|
107
+ | | |none | 0|acc_norm|↑ |0.5340 |0.0348|
108
+ |agieval_sat_en_without_passage| 1|none | 0|acc |↑ |0.3981 |0.0342|
109
+ | | |none | 0|acc_norm|↑ |0.3981 |0.0342|
110
+ |agieval_sat_math | 1|none | 0|acc |↑ |0.2500 |0.0293|
111
+ | | |none | 0|acc_norm|↑ |0.2636 |0.0298|
112
  ```
113
 
114
+ Average: 34.36
115
 
116
  ## BigBench:
117
 
118
  ```
119
+ | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
120
+ |-------------------------------------------------------|------:|------|-----:|--------|---|-----:|---|-----:|
121
+ |leaderboard_bbh_boolean_expressions | 1|none | 3|acc_norm|↑ |0.7560|± |0.0272|
122
+ |leaderboard_bbh_causal_judgement | 1|none | 3|acc_norm|↑ |0.6043 |0.0359|
123
+ |leaderboard_bbh_date_understanding | 1|none | 3|acc_norm|↑ |0.3280 |0.0298|
124
+ |leaderboard_bbh_disambiguation_qa | 1|none | 3|acc_norm|↑ |0.5880 |0.0312|
125
+ |leaderboard_bbh_formal_fallacies | 1|none | 3|acc_norm|↑ |0.5280 |0.0316|
126
+ |leaderboard_bbh_geometric_shapes | 1|none | 3|acc_norm|↑ |0.3560 |0.0303|
127
+ |leaderboard_bbh_hyperbaton | 1|none | 3|acc_norm|↑ |0.6280 |0.0306|
128
+ |leaderboard_bbh_logical_deduction_five_objects | 1|none | 3|acc_norm|↑ |0.3400 |0.0300|
129
+ |leaderboard_bbh_logical_deduction_seven_objects | 1|none | 3|acc_norm|↑ |0.2880 |0.0287|
130
+ |leaderboard_bbh_logical_deduction_three_objects | 1|none | 3|acc_norm|↑ |0.4160 |0.0312|
131
+ |leaderboard_bbh_movie_recommendation | 1|none | 3|acc_norm|↑ |0.6760 |0.0297|
132
+ |leaderboard_bbh_navigate | 1|none | 3|acc_norm|↑ |0.5800 |0.0313|
133
+ |leaderboard_bbh_object_counting | 1|none | 3|acc_norm|↑ |0.3640 |0.0305|
134
+ |leaderboard_bbh_penguins_in_a_table | 1|none | 3|acc_norm|↑ |0.3836 |0.0404|
135
+ |leaderboard_bbh_reasoning_about_colored_objects | 1|none | 3|acc_norm|↑ |0.3560 |0.0303|
136
+ |leaderboard_bbh_ruin_names | 1|none | 3|acc_norm|↑ |0.4160 |0.0312|
137
+ |leaderboard_bbh_salient_translation_error_detection | 1|none | 3|acc_norm|↑ |0.3080 |0.0293|
138
+ |leaderboard_bbh_snarks | 1|none | 3|acc_norm|↑ |0.5618 |0.0373|
139
+ |leaderboard_bbh_sports_understanding | 1|none | 3|acc_norm|↑ |0.6600 |0.0300|
140
+ |leaderboard_bbh_temporal_sequences | 1|none | 3|acc_norm|↑ |0.2320 |0.0268|
141
+ |leaderboard_bbh_tracking_shuffled_objects_five_objects | 1|none | 3|acc_norm|↑ |0.1640|± |0.0235|
142
+ |leaderboard_bbh_tracking_shuffled_objects_seven_objects| 1|none | 3|acc_norm|↑ |0.1480|± |0.0225|
143
+ |leaderboard_bbh_tracking_shuffled_objects_three_objects| 1|none | 3|acc_norm|↑ |0.3120|± |0.0294|
144
+ |leaderboard_bbh_web_of_lies | 1|none | 3|acc_norm|↑ |0.5080|± |0.0317|
145
  ```
146
 
147
+ Average: 43.76
148
 
149
 
150
  # Prompt Format
 
187
 
188
  ## Prompt Format for Function Calling
189
 
190
+ # Note: A previous version used USER as both the user prompt and the tool response role, but this has now been fixed. Please use USER for the user prompt role and TOOL for the tool response role.
191
 
192
  Our model was trained on specific system prompts and structures for Function Calling.
193
 
 
216
 
217
  Once you parse the tool call, call the api and get the returned values for the call, and pass it back in as a new role, `tool` like so:
218
  ```
219
+ <|im_start|>tool
220
  <tool_response>
221
  {"name": "get_stock_fundamentals", "content": {'symbol': 'TSLA', 'company_name': 'Tesla, Inc.', 'sector': 'Consumer Cyclical', 'industry': 'Auto Manufacturers', 'market_cap': 611384164352, 'pe_ratio': 49.604652, 'pb_ratio': 9.762013, 'dividend_yield': None, 'eps': 4.3, 'beta': 2.427, '52_week_high': 299.29, '52_week_low': 152.37}}
222
  </tool_response>
 
322
  url={https://arxiv.org/abs/2408.11857},
323
  }
324
  ```