lesser-hermes
Why?
Hermes is very, um, Hermes-y. I wanted to dilute it so I could use it as an ingredient for other things. Sampling Hermes is a pain in the ass, it either sounds super model-esque or it loses all instructability. Hence, dilution back to the root.
This is a merge of pre-trained language models created using mergekit. We've been using this as one of the experimental ingredients to help stabilize the monkey-typewriter merges, and it's kinda okay at that.
Note that modern mergekit handles MoE just fine, now. But back in the day it did a horrible job and only the fork worked properly.
Merge Details
Merge Method
This model was merged using the DARE TIES merge method using mistralai/Mixtral-8x7B-Instruct-v0.1 as a base.
Models Merged
The following models were included in the merge:
Configuration
The following YAML configuration was used to produce this model:
models:
# dont bagel me bro
- model: NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO
parameters:
density: 0.25
weight: 0.3
- model: mistralai/Mixtral-8x7B-Instruct-v0.1
parameters:
density: 0.5
weight: 1
merge_method: dare_ties
base_model: mistralai/Mixtral-8x7B-Instruct-v0.1
parameters:
#normalize: false
#int8_mask: true
dtype: bfloat16
- Downloads last month
- 6