lesser-hermes

Why?

Hermes is very, um, Hermes-y. I wanted to dilute it so I could use it as an ingredient for other things. Sampling Hermes is a pain in the ass, it either sounds super model-esque or it loses all instructability. Hence, dilution back to the root.

This is a merge of pre-trained language models created using mergekit. We've been using this as one of the experimental ingredients to help stabilize the monkey-typewriter merges, and it's kinda okay at that.

Note that modern mergekit handles MoE just fine, now. But back in the day it did a horrible job and only the fork worked properly.

Merge Details

Merge Method

This model was merged using the DARE TIES merge method using mistralai/Mixtral-8x7B-Instruct-v0.1 as a base.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

models:
  # dont bagel me bro
  - model: NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO
    parameters:
      density: 0.25
      weight: 0.3
  - model: mistralai/Mixtral-8x7B-Instruct-v0.1
    parameters:
      density: 0.5
      weight: 1
merge_method: dare_ties
base_model: mistralai/Mixtral-8x7B-Instruct-v0.1
parameters:
  #normalize: false
  #int8_mask: true
dtype: bfloat16
Downloads last month
6
Safetensors
Model size
46.7B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for sandmanbuzz/lesser-hermes