|
--- |
|
license: llama3 |
|
--- |
|
|
|
This is an experimental 2x8B moe with random gates, using the following 2 models |
|
|
|
- Hermes-2-Theta-l3-8B by Nous Research https://huggingface.co/NousResearch/Hermes-2-Theta-Llama-3-8B |
|
|
|
- llama-3-cat-8B-instruct-V1 by TheSkullery https://huggingface.co/TheSkullery/llama-3-cat-8b-instruct-v1 |
|
|
|
|
|
***Important*** |
|
|
|
Make sure to add `</s>` a stop sequence as it uses llama-3-cat-8B-instruct-V1 as the base model. |
|
|
|
Update: |
|
|
|
Due to request i decided to add the rest of the quants. Enjoy |
|
|
|
|
|
Mergekit recipe of the model if too lazy to check the files: |
|
|
|
``` |
|
base_model: TheSkullery/llama-3-cat-8b-instruct-v1 |
|
gate_mode: random |
|
dtype: bfloat16 |
|
experts_per_token: 2 |
|
experts: |
|
- source_model: TheSkullery/llama-3-cat-8b-instruct-v1 |
|
positive_prompts: |
|
- " " |
|
- source_model: NousResearch/Hermes-2-Theta-Llama-3-8B |
|
positive_prompts: |
|
- " " |
|
``` |