Aqueducts 18B

image/jpeg

A feindishly complex stack-megamerge consisting of mostly mistral/solar-based models. This proof-of-concept stack is intended to demonstrate the correct way to build up from 7b to 18b while maintaining coherency/intelligence and minimizing repetition/jank.

image/jpeg

This isn't my recipe. This was engineered by FM, so the credit/blame is his. Here is how he explains what's going on here:

"Stack" merging exists as an alternative to straight up merging of models; its general idea comes from the fact that in a stacked arrangement the models will preserve their weights better than when merged in any way. Unfortunately, the results are often not so predictable as we'd wish them to be, and the models end up losing their crucial capabilities, thus invalidating the whole point of preserving them in the first place.

In the irregular iterative experiments (Jan-Apr '24), some conclusions were reached:

  1. The naive "Frankenmerge" stacking of the slices of models doesn't preserve the input and the output layers of the participating models; however, if said layers are merged prior and reused for the whole stacked model, the capabilities of the used momdels appear to be restored, if partially.
  2. The often overlooked gradient merge, while not enhancing the simple merges of models much, proves crucial in saving space (layers) when attempting to stack models "lengthwise". In this recipe, the target was to approximate the prompt passing within the internal layers of three 11B models, fit within the space for two. Straight stacking of 3 such models would've produced a model of 22B parameters with 96 layers, while this construction allows us to use just 80.

Note: the results achieved are mostly subjetive and not confirmed by the rigorous testing. Note 2: for the gradient merging of 11B models, it's highly advisable to study their structure; since at inception, it is made of layers of a duplicate 7B model, it is preferrable to merge the layer slices that align with each other internally. This will become irrelevant soon because Solar old.

See recipe.yml if you want to examine the madness in detail.

This model is uncensored and capable of generating objectionable material. As with any LLM, no factual claims made by the model should be taken at face value. You know that boilerplate safety disclaimer that most professional models have? Assume this has it too. This model is for entertainment purposes only.

GGUFs: https://huggingface.co/MarsupialAI/Aqueducts-18B_iMatrix_GGUF

Downloads last month
15
Safetensors
Model size
17.7B params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for MarsupialAI/Aqueducts-18B

Finetuned
(42)
this model
Quantizations
2 models