lesser-hermes

Why?

Hermes is very, um, Hermes-y. I wanted to dilute it so I could use it as an ingredient for other things. Sampling Hermes is a pain in the ass, it either sounds super model-esque or it loses all instructability. Hence, dilution back to the root.

This is a merge of pre-trained language models created using mergekit. We've been using this as one of the experimental ingredients to help stabilize the monkey-typewriter merges, and it's kinda okay at that.

Note that modern mergekit handles MoE just fine, now. But back in the day it did a horrible job and only the fork worked properly.

Merge Details

Merge Method

This model was merged using the DARE TIES merge method using mistralai/Mixtral-8x7B-Instruct-v0.1 as a base.

Models Merged

The following models were included in the merge:

NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO

Configuration

The following YAML configuration was used to produce this model:

models:
  # dont bagel me bro
  - model: NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO
    parameters:
      density: 0.25
      weight: 0.3
  - model: mistralai/Mixtral-8x7B-Instruct-v0.1
    parameters:
      density: 0.5
      weight: 1
merge_method: dare_ties
base_model: mistralai/Mixtral-8x7B-Instruct-v0.1
parameters:
  #normalize: false
  #int8_mask: true
dtype: bfloat16

Downloads last month: 2

Safetensors

Model size

47B params

Tensor type

BF16

Model tree for sandmanbuzz/lesser-hermes

NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO

mistralai/Mixtral-8x7B-Instruct-v0.1

Merge model

this model

Papers for sandmanbuzz/lesser-hermes

Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch

Paper • 2311.03099 • Published Nov 6, 2023 • 30

Resolving Interference When Merging Models

Paper • 2306.01708 • Published Jun 2, 2023 • 17