MarsupialAI
/

Yeet_51b_200k

@@ -4,13 +4,20 @@ license_name: yi-other
 ---
 # Yeet 51b 200k
-This model is a rotating-stack merge of three Yi 34b 200k models in a 51b (90 layer) configuration.  My reasoning behind this merge was twofold:  I'd never seen a stacked merge made from 34b models, and I thought that maybe this could give near-70b performance but with a much larger context window, but still fitting within 48GB of VRAM.  I think the results are quite good.  The model does as well as many 70b models at RP/ERP, chat, and storywriting.  At Q4_K_S it will fit into a pair of 24GB GPUs with 32k context.  Coherency at 32k is excellent, and will probably remain very good well beyond that thanks to the 200k base training.
 The gotcha here is speed.  While it inferences as you'd expect for the model size, it's much slower than a similarly-sized 8x7b MoE.  And while I personally find the output of this model to outperform any mixtral finetune I've seen so far, those finetunes are getting better all the time, and this really is achingly slow with a lot of context.  I'm getting less than half a token per second on a pair of P40s with a full 32k prompt.
 But that's not to say this model (or even the 51b stack concept) is useless.  If you're patient, you can get extremely good output with very deep context on attainable hardware.  There are undoubtedly niche scenarios where this model or similarly-constructed models might be ideal.
 # Sample output
@@ -38,8 +45,7 @@ In the end, it was just a simple story about a cute and fluffy bunny who venture
 # Prompt format
-Seems to have the strongest affinity for Alpaca prompts, but Vicuna works as well.  Considering the variety of components, most
-formats will probbaly work to some extent.
 # WTF is a rotating-stack merge?

 ---
 # Yeet 51b 200k
+This model is a rotating-stack merge of three Yi 34b 200k models in a 51b (90 layer) configuration.  See My reasoning behind this merge was twofold:  I'd never seen a stacked merge made from 34b models, and I thought that maybe this could give near-70b performance but with a much larger context window, but still fitting within 48GB of VRAM.  I think the results are quite good.  The model does as well as many 70b models at RP/ERP, chat, and storywriting.  At Q4_K_S it will fit into a pair of 24GB GPUs with 32k context.  Coherency at 32k is excellent, and will probably remain very good well beyond that thanks to the 200k base training.
 The gotcha here is speed.  While it inferences as you'd expect for the model size, it's much slower than a similarly-sized 8x7b MoE.  And while I personally find the output of this model to outperform any mixtral finetune I've seen so far, those finetunes are getting better all the time, and this really is achingly slow with a lot of context.  I'm getting less than half a token per second on a pair of P40s with a full 32k prompt.
 But that's not to say this model (or even the 51b stack concept) is useless.  If you're patient, you can get extremely good output with very deep context on attainable hardware.  There are undoubtedly niche scenarios where this model or similarly-constructed models might be ideal.
+Component models for the rotating stack are
+- adamo1139/Yi-34B-200K-AEZAKMI-v2
+- brucethemoose/Yi-34B-200K-DARE-megamerge-v8
+- taozi555/RpBird-Yi-34B-200k
+This model is uncensored and perfectly capable of generating objectionable material. However, it is not an explicitely-NSFW model, and it has never "gone rogue" and tried to insert NSFW content into SFW prompts in my experience. As with any LLM, no factual claims made by the model should be taken at face value. You know that boilerplate safety disclaimer that most professional models have?  Assume this has it too. This model is for entertainment purposes only.
+FP16 and Q4_K_S GGUFs are located here:  https://huggingface.co/MarsupialAI/Yeet_51b_200k_GGUF_Q4KS_FP16
 # Sample output
 # Prompt format
+Seems to work fine with Alpaca prompts.  Considering the variety of components, other formats are likely to work to some extent.
 # WTF is a rotating-stack merge?