QuantFactory
/

Hermes-3-Llama-3.2-3B-GGUF

@@ -71,76 +71,80 @@ Hermes 3 is competitive, if not superior, to Llama-3.1 Instruct models at genera
 ## GPT4All:
 ```
-|    Task     |Version| Metric |Value |   |Stderr|
-|-------------|------:|--------|-----:|---|-----:|
-|arc_challenge|      0|acc     |0.5529|±  |0.0145|
-|             |       |acc_norm|0.5870|±  |0.0144|
-|arc_easy     |      0|acc     |0.8371|±  |0.0076|
-|             |       |acc_norm|0.8144|±  |0.0080|
-|boolq        |      1|acc     |0.8599|±  |0.0061|
-|hellaswag    |      0|acc     |0.6133|±  |0.0049|
-|             |       |acc_norm|0.7989|±  |0.0040|
-|openbookqa   |      0|acc     |0.3940|±  |0.0219|
-|             |       |acc_norm|0.4680|±  |0.0223|
-|piqa         |      0|acc     |0.8063|±  |0.0092|
-|             |       |acc_norm|0.8156|±  |0.0090|
-|winogrande   |      0|acc     |0.7372|±  |0.0124|
 ```
-Average: 72.59
 ## AGIEval:
 ```
-|             Task             |Version| Metric |Value |   |Stderr|
-|------------------------------|------:|--------|-----:|---|-----:|
-|agieval_aqua_rat              |      0|acc     |0.2441|±  |0.0270|
-|                              |       |acc_norm|0.2441|±  |0.0270|
-|agieval_logiqa_en             |      0|acc     |0.3687|±  |0.0189|
-|                              |       |acc_norm|0.3840|±  |0.0191|
-|agieval_lsat_ar               |      0|acc     |0.2304|±  |0.0278|
-|                              |       |acc_norm|0.2174|±  |0.0273|
-|agieval_lsat_lr               |      0|acc     |0.5471|±  |0.0221|
-|                              |       |acc_norm|0.5373|±  |0.0221|
-|agieval_lsat_rc               |      0|acc     |0.6617|±  |0.0289|
-|                              |       |acc_norm|0.6357|±  |0.0294|
-|agieval_sat_en                |      0|acc     |0.7670|±  |0.0295|
-|                              |       |acc_norm|0.7379|±  |0.0307|
-|agieval_sat_en_without_passage|      0|acc     |0.4417|±  |0.0347|
-|                              |       |acc_norm|0.4223|±  |0.0345|
-|agieval_sat_math              |      0|acc     |0.4000|±  |0.0331|
-|                              |       |acc_norm|0.3455|±  |0.0321|
 ```
-Average: 44.05
 ## BigBench:
 ```
-|                      Task                      |Version|       Metric        |Value |   |Stderr|
-|------------------------------------------------|------:|---------------------|-----:|---|-----:|
-|bigbench_causal_judgement                       |      0|multiple_choice_grade|0.6000|±  |0.0356|
-|bigbench_date_understanding                     |      0|multiple_choice_grade|0.6585|±  |0.0247|
-|bigbench_disambiguation_qa                      |      0|multiple_choice_grade|0.3178|±  |0.0290|
-|bigbench_geometric_shapes                       |      0|multiple_choice_grade|0.2340|±  |0.0224|
-|                                                |       |exact_str_match      |0.0000|±  |0.0000|
-|bigbench_logical_deduction_five_objects         |      0|multiple_choice_grade|0.2980|±  |0.0205|
-|bigbench_logical_deduction_seven_objects        |      0|multiple_choice_grade|0.2057|±  |0.0153|
-|bigbench_logical_deduction_three_objects        |      0|multiple_choice_grade|0.5367|±  |0.0288|
-|bigbench_movie_recommendation                   |      0|multiple_choice_grade|0.4040|±  |0.0220|
-|bigbench_navigate                               |      0|multiple_choice_grade|0.4970|±  |0.0158|
-|bigbench_reasoning_about_colored_objects        |      0|multiple_choice_grade|0.7075|±  |0.0102|
-|bigbench_ruin_names                             |      0|multiple_choice_grade|0.4821|±  |0.0236|
-|bigbench_salient_translation_error_detection    |      0|multiple_choice_grade|0.2295|±  |0.0133|
-|bigbench_snarks                                 |      0|multiple_choice_grade|0.6906|±  |0.0345|
-|bigbench_sports_understanding                   |      0|multiple_choice_grade|0.5375|±  |0.0159|
-|bigbench_temporal_sequences                     |      0|multiple_choice_grade|0.6270|±  |0.0153|
-|bigbench_tracking_shuffled_objects_five_objects |      0|multiple_choice_grade|0.2216|±  |0.0118|
-|bigbench_tracking_shuffled_objects_seven_objects|      0|multiple_choice_grade|0.1594|±  |0.0088|
-|bigbench_tracking_shuffled_objects_three_objects|      0|multiple_choice_grade|0.5367|±  |0.0288|
 ```
-Average: 44.13
 # Prompt Format
@@ -183,7 +187,7 @@ To utilize the prompt format without a system prompt, simply leave the line out.
 ## Prompt Format for Function Calling
-# Note: This version uses USER as both the user prompt and the tool response role. This is due to a bug we experienced when training. It will require modification to the function calling code!
 Our model was trained on specific system prompts and structures for Function Calling.
@@ -212,7 +216,7 @@ The model will then generate a tool call, which your inference code must parse,
 Once you parse the tool call, call the api and get the returned values for the call, and pass it back in as a new role, `tool` like so:
 ```
-<|im_start|>user
 <tool_response>
 {"name": "get_stock_fundamentals", "content": {'symbol': 'TSLA', 'company_name': 'Tesla, Inc.', 'sector': 'Consumer Cyclical', 'industry': 'Auto Manufacturers', 'market_cap': 611384164352, 'pe_ratio': 49.604652, 'pb_ratio': 9.762013, 'dividend_yield': None, 'eps': 4.3, 'beta': 2.427, '52_week_high': 299.29, '52_week_low': 152.37}}
 </tool_response>
@@ -318,6 +322,3 @@ GGUF Quants: https://huggingface.co/NousResearch/Hermes-3-Llama-3.2-3B-GGUF
       url={https://arxiv.org/abs/2408.11857},
 }
 ```

 ## GPT4All:
 ```
+|    Tasks    |Version|Filter|n-shot| Metric |   |Value |   |Stderr|
+|-------------|------:|------|-----:|--------|---|-----:|---|-----:|
+|arc_challenge|      1|none  |     0|acc     |↑  |0.4411|±  |0.0145|
+|             |       |none  |     0|acc_norm|↑  |0.4377|±  |0.0145|
+|arc_easy     |      1|none  |     0|acc     |↑  |0.7399|±  |0.0090|
+|             |       |none  |     0|acc_norm|↑  |0.6566|±  |0.0097|
+|boolq        |      2|none  |     0|acc     |↑  |0.8327|±  |0.0065|
+|hellaswag    |      1|none  |     0|acc     |↑  |0.5453|±  |0.0050|
+|             |       |none  |     0|acc_norm|↑  |0.7047|±  |0.0046|
+|openbookqa   |      1|none  |     0|acc     |↑  |0.3480|±  |0.0213|
+|             |       |none  |     0|acc_norm|↑  |0.4280|±  |0.0221|
+|piqa         |      1|none  |     0|acc     |↑  |0.7639|±  |0.0099|
+|             |       |none  |     0|acc_norm|↑  |0.7584|±  |0.0100|
+|winogrande   |      1|none  |     0|acc     |↑  |0.6590|±  |0.0133|
 ```
+Average: 64.00
 ## AGIEval:
 ```
+|            Tasks             |Version|Filter|n-shot| Metric |   |Value |   |Stderr|
+|------------------------------|------:|------|-----:|--------|---|-----:|---|-----:|
+|agieval_aqua_rat              |      1|none  |     0|acc     |↑  |0.2283|±  |0.0264|
+|                              |       |none  |     0|acc_norm|↑  |0.2441|±  |0.0270|
+|agieval_logiqa_en             |      1|none  |     0|acc     |↑  |0.3057|±  |0.0181|
+|                              |       |none  |     0|acc_norm|↑  |0.3272|±  |0.0184|
+|agieval_lsat_ar               |      1|none  |     0|acc     |↑  |0.2304|±  |0.0278|
+|                              |       |none  |     0|acc_norm|↑  |0.1957|±  |0.0262|
+|agieval_lsat_lr               |      1|none  |     0|acc     |↑  |0.3784|±  |0.0215|
+|                              |       |none  |     0|acc_norm|↑  |0.3588|±  |0.0213|
+|agieval_lsat_rc               |      1|none  |     0|acc     |↑  |0.4610|±  |0.0304|
+|                              |       |none  |     0|acc_norm|↑  |0.4275|±  |0.0302|
+|agieval_sat_en                |      1|none  |     0|acc     |↑  |0.6019|±  |0.0342|
+|                              |       |none  |     0|acc_norm|↑  |0.5340|±  |0.0348|
+|agieval_sat_en_without_passage|      1|none  |     0|acc     |↑  |0.3981|±  |0.0342|
+|                              |       |none  |     0|acc_norm|↑  |0.3981|±  |0.0342|
+|agieval_sat_math              |      1|none  |     0|acc     |↑  |0.2500|±  |0.0293|
+|                              |       |none  |     0|acc_norm|↑  |0.2636|±  |0.0298|
 ```
+Average: 34.36
 ## BigBench:
 ```
+|                         Tasks                         |Version|Filter|n-shot| Metric |   |Value |   |Stderr|
+|-------------------------------------------------------|------:|------|-----:|--------|---|-----:|---|-----:|
+|leaderboard_bbh_boolean_expressions                    |      1|none  |     3|acc_norm|↑  |0.7560|±  |0.0272|
+|leaderboard_bbh_causal_judgement                       |      1|none  |     3|acc_norm|↑  |0.6043|±  |0.0359|
+|leaderboard_bbh_date_understanding                     |      1|none  |     3|acc_norm|↑  |0.3280|±  |0.0298|
+|leaderboard_bbh_disambiguation_qa                      |      1|none  |     3|acc_norm|↑  |0.5880|±  |0.0312|
+|leaderboard_bbh_formal_fallacies                       |      1|none  |     3|acc_norm|↑  |0.5280|±  |0.0316|
+|leaderboard_bbh_geometric_shapes                       |      1|none  |     3|acc_norm|↑  |0.3560|±  |0.0303|
+|leaderboard_bbh_hyperbaton                             |      1|none  |     3|acc_norm|↑  |0.6280|±  |0.0306|
+|leaderboard_bbh_logical_deduction_five_objects         |      1|none  |     3|acc_norm|↑  |0.3400|±  |0.0300|
+|leaderboard_bbh_logical_deduction_seven_objects        |      1|none  |     3|acc_norm|↑  |0.2880|±  |0.0287|
+|leaderboard_bbh_logical_deduction_three_objects        |      1|none  |     3|acc_norm|↑  |0.4160|±  |0.0312|
+|leaderboard_bbh_movie_recommendation                   |      1|none  |     3|acc_norm|↑  |0.6760|±  |0.0297|
+|leaderboard_bbh_navigate                               |      1|none  |     3|acc_norm|↑  |0.5800|±  |0.0313|
+|leaderboard_bbh_object_counting                        |      1|none  |     3|acc_norm|↑  |0.3640|±  |0.0305|
+|leaderboard_bbh_penguins_in_a_table                    |      1|none  |     3|acc_norm|↑  |0.3836|±  |0.0404|
+|leaderboard_bbh_reasoning_about_colored_objects        |      1|none  |     3|acc_norm|↑  |0.3560|±  |0.0303|
+|leaderboard_bbh_ruin_names                             |      1|none  |     3|acc_norm|↑  |0.4160|±  |0.0312|
+|leaderboard_bbh_salient_translation_error_detection    |      1|none  |     3|acc_norm|↑  |0.3080|±  |0.0293|
+|leaderboard_bbh_snarks                                 |      1|none  |     3|acc_norm|↑  |0.5618|±  |0.0373|
+|leaderboard_bbh_sports_understanding                   |      1|none  |     3|acc_norm|↑  |0.6600|±  |0.0300|
+|leaderboard_bbh_temporal_sequences                     |      1|none  |     3|acc_norm|↑  |0.2320|±  |0.0268|
+|leaderboard_bbh_tracking_shuffled_objects_five_objects |      1|none  |     3|acc_norm|↑  |0.1640|±  |0.0235|
+|leaderboard_bbh_tracking_shuffled_objects_seven_objects|      1|none  |     3|acc_norm|↑  |0.1480|±  |0.0225|
+|leaderboard_bbh_tracking_shuffled_objects_three_objects|      1|none  |     3|acc_norm|↑  |0.3120|±  |0.0294|
+|leaderboard_bbh_web_of_lies                            |      1|none  |     3|acc_norm|↑  |0.5080|±  |0.0317|
 ```
+Average: 43.76
 # Prompt Format
 ## Prompt Format for Function Calling
+# Note: A previous version used USER as both the user prompt and the tool response role, but this has now been fixed. Please use USER for the user prompt role and TOOL for the tool response role.
 Our model was trained on specific system prompts and structures for Function Calling.
 Once you parse the tool call, call the api and get the returned values for the call, and pass it back in as a new role, `tool` like so:
 ```
+<|im_start|>tool
 <tool_response>
 {"name": "get_stock_fundamentals", "content": {'symbol': 'TSLA', 'company_name': 'Tesla, Inc.', 'sector': 'Consumer Cyclical', 'industry': 'Auto Manufacturers', 'market_cap': 611384164352, 'pe_ratio': 49.604652, 'pb_ratio': 9.762013, 'dividend_yield': None, 'eps': 4.3, 'beta': 2.427, '52_week_high': 299.29, '52_week_low': 152.37}}
 </tool_response>
       url={https://arxiv.org/abs/2408.11857},
 }
 ```