|
--- |
|
base_model: NousResearch/Meta-Llama-3-8B |
|
tags: |
|
- Llama-3 |
|
- instruct |
|
- finetune |
|
- chatml |
|
- DPO |
|
- RLHF |
|
- gpt4 |
|
- synthetic data |
|
- distillation |
|
- function calling |
|
- json mode |
|
model-index: |
|
- name: Hermes-2-Pro-Llama-3-8B |
|
results: [] |
|
license: apache-2.0 |
|
language: |
|
- en |
|
datasets: |
|
- teknium/OpenHermes-2.5 |
|
--- |
|
> [!NOTE] |
|
> This is a model that is assumed to perform well, but may require more testing and user feedback. Be aware, only models featured within the GUI of GPT4All, are curated and officially supported by Nomic. Use at your own risk. |
|
# Hermes 2 Pro - Llama-3 8B Quantized to Q4_0 for the GPT4All community by 3Simplex |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6317aade83d8d2fd903192d9/ggO2sBDJ8Bhc6w-zwTx5j.png) |
|
|
|
## Model Description |
|
|
|
### This is the llama.cpp GGUF Quantized version of Hermes 2 Pro Llama-3 8B, for the full version, click [Here](https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B) |
|
|
|
Hermes 2 Pro is an upgraded version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house. |
|
|
|
This new version of Hermes maintains its excellent general task and conversation capabilities - but also excels at Function Calling, JSON Structured Outputs, and has improved on several other metrics as well, scoring a 90% on our function calling evaluation built in partnership with Fireworks.AI, and an 84% on our structured JSON Output evaluation. |
|
|
|
Hermes Pro takes advantage of a special system prompt and multi-turn function calling structure with a new chatml role in order to make function calling reliable and easy to parse. Learn more about prompting below. |
|
|
|
This version of Hermes 2 Pro adds several tokens to assist with agentic capabilities in parsing while streaming tokens - `<tools>`, `<tool_call>`, `<tool_response>` and their closing tags are single tokens now. |
|
|
|
This work was a collaboration between Nous Research, @interstellarninja, and Fireworks.AI |
|
|
|
Learn more about the function calling system for this model on our github repo here: https://github.com/NousResearch/Hermes-Function-Calling |
|
|
|
# Prompt Format |
|
|
|
Hermes 2 Pro uses ChatML as the prompt format, opening up a much more structured system for engaging the LLM in multi-turn chat dialogue. |
|
|
|
System prompts allow steerability and interesting new ways to interact with an LLM, guiding rules, roles, and stylistic choices of the model. |
|
|
|
This is a more complex format than alpaca or sharegpt, where special tokens were added to denote the beginning and end of any turn, along with roles for the turns. |
|
|
|
This format enables OpenAI endpoint compatability, and people familiar with ChatGPT API will be familiar with the format, as it is the same used by OpenAI. |
|
|
|
Prompt with system instruction (Use whatever system prompt you like, this is just an example!): |
|
``` |
|
<|im_start|>system |
|
You are a helpful assistant.<|im_end|> |
|
``` |
|
|
|
``` |
|
<|im_start|>user |
|
{user input}<|im_end|> |
|
<|im_start|>assistant |
|
{assistant response}<|im_end|> |
|
``` |
|
|
|
## Prompt Format for JSON Mode / Structured Outputs |
|
|
|
Our model was also trained on a specific system prompt for Structured Outputs, which should respond with **only** a json object response, in a specific json schema. |
|
|
|
Your schema can be made from a pydantic object using our codebase, with the standalone script `jsonmode.py` available here: https://github.com/NousResearch/Hermes-Function-Calling/tree/main |
|
|
|
``` |
|
<|im_start|>system |
|
You are a helpful assistant that answers in JSON. Here's the json schema you must adhere to:\n<schema>\n{schema}\n</schema><|im_end|> |
|
``` |
|
|
|
Given the {schema} that you provide, it should follow the format of that json to create it's response, all you have to do is give a typical user prompt, and it will respond in JSON. |
|
|
|
|
|
# Benchmarks |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6317aade83d8d2fd903192d9/vOYv9wJUMn1Xrf4BvmO_x.png) |
|
|
|
## GPT4All: |
|
``` |
|
| Task |Version| Metric |Value | |Stderr| |
|
|-------------|------:|--------|-----:|---|-----:| |
|
|arc_challenge| 0|acc |0.5520|± |0.0145| |
|
| | |acc_norm|0.5887|± |0.0144| |
|
|arc_easy | 0|acc |0.8350|± |0.0076| |
|
| | |acc_norm|0.8123|± |0.0080| |
|
|boolq | 1|acc |0.8584|± |0.0061| |
|
|hellaswag | 0|acc |0.6265|± |0.0048| |
|
| | |acc_norm|0.8053|± |0.0040| |
|
|openbookqa | 0|acc |0.3800|± |0.0217| |
|
| | |acc_norm|0.4580|± |0.0223| |
|
|piqa | 0|acc |0.8003|± |0.0093| |
|
| | |acc_norm|0.8118|± |0.0091| |
|
|winogrande | 0|acc |0.7490|± |0.0122| |
|
``` |
|
Average: 72.62 |
|
|
|
## AGIEval: |
|
``` |
|
| Task |Version| Metric |Value | |Stderr| |
|
|------------------------------|------:|--------|-----:|---|-----:| |
|
|agieval_aqua_rat | 0|acc |0.2520|± |0.0273| |
|
| | |acc_norm|0.2559|± |0.0274| |
|
|agieval_logiqa_en | 0|acc |0.3548|± |0.0188| |
|
| | |acc_norm|0.3625|± |0.0189| |
|
|agieval_lsat_ar | 0|acc |0.1826|± |0.0255| |
|
| | |acc_norm|0.1913|± |0.0260| |
|
|agieval_lsat_lr | 0|acc |0.5510|± |0.0220| |
|
| | |acc_norm|0.5255|± |0.0221| |
|
|agieval_lsat_rc | 0|acc |0.6431|± |0.0293| |
|
| | |acc_norm|0.6097|± |0.0298| |
|
|agieval_sat_en | 0|acc |0.7330|± |0.0309| |
|
| | |acc_norm|0.7039|± |0.0319| |
|
|agieval_sat_en_without_passage| 0|acc |0.4029|± |0.0343| |
|
| | |acc_norm|0.3689|± |0.0337| |
|
|agieval_sat_math | 0|acc |0.3909|± |0.0330| |
|
| | |acc_norm|0.3773|± |0.0328| |
|
``` |
|
Average: 42.44 |
|
|
|
## BigBench: |
|
``` |
|
| Task |Version| Metric |Value | |Stderr| |
|
|------------------------------------------------|------:|---------------------|-----:|---|-----:| |
|
|bigbench_causal_judgement | 0|multiple_choice_grade|0.5737|± |0.0360| |
|
|bigbench_date_understanding | 0|multiple_choice_grade|0.6667|± |0.0246| |
|
|bigbench_disambiguation_qa | 0|multiple_choice_grade|0.3178|± |0.0290| |
|
|bigbench_geometric_shapes | 0|multiple_choice_grade|0.1755|± |0.0201| |
|
| | |exact_str_match |0.0000|± |0.0000| |
|
|bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|0.3120|± |0.0207| |
|
|bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|0.2014|± |0.0152| |
|
|bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|0.5500|± |0.0288| |
|
|bigbench_movie_recommendation | 0|multiple_choice_grade|0.4300|± |0.0222| |
|
|bigbench_navigate | 0|multiple_choice_grade|0.4980|± |0.0158| |
|
|bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|0.7010|± |0.0102| |
|
|bigbench_ruin_names | 0|multiple_choice_grade|0.4688|± |0.0236| |
|
|bigbench_salient_translation_error_detection | 0|multiple_choice_grade|0.1974|± |0.0126| |
|
|bigbench_snarks | 0|multiple_choice_grade|0.7403|± |0.0327| |
|
|bigbench_sports_understanding | 0|multiple_choice_grade|0.5426|± |0.0159| |
|
|bigbench_temporal_sequences | 0|multiple_choice_grade|0.5320|± |0.0158| |
|
|bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|0.2280|± |0.0119| |
|
|bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|0.1531|± |0.0086| |
|
|bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|0.5500|± |0.0288| |
|
``` |
|
Average: 43.55 |
|
|
|
## TruthfulQA: |
|
``` |
|
| Task |Version|Metric|Value| |Stderr| |
|
|-------------|------:|------|----:|---|-----:| |
|
|truthfulqa_mc| 1|mc1 |0.410|± |0.0172| |
|
| | |mc2 |0.578|± |0.0157| |
|
``` |
|
|
|
# How to cite: |
|
|
|
```bibtext |
|
@misc{Hermes-2-Pro-Llama-3-8B, |
|
url={[https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B]https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B)}, |
|
title={Hermes-2-Pro-Llama-3-8B}, |
|
author={"Teknium", "interstellarninja", "theemozilla", "karan4d", "huemin_art"} |
|
} |
|
``` |
|
|