|
--- |
|
license: mit |
|
--- |
|
# ๐ฅ MoE-Mixtral-7B-8Expert |
|
[mixtral-8x7b](https://huggingface.co/someone13574/mixtral-8x7b-32kseqlen) is a Mixture-of-Expert (MoE) model. |
|
[LLaMA2-Accessory](https://github.com/Alpha-VLLM/LLaMA2-Accessory) has supported its inference and finetuning. |
|
|
|
## ๐ Features |
|
With LLaMA2-Accessory, mixtral-8x7b enjoys the following features: |
|
1. Distributed MoE (namely instantiating experts on multiple processes/gpus) |
|
2. Load Balancing Loss |
|
3. Tensor Parallel and FSDP for efficiently training |
|
4. Distributed and/or quantized inference |
|
|
|
## ๐ฅ Online Demo |
|
We host a web demo [๐ปhere](http://106.14.127.192/), which shows a mixtral-8x7b model finetuned on |
|
[evol-codealpaca-v1](https://huggingface.co/datasets/theblackcat102/evol-codealpaca-v1) and |
|
[ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k), with LoRA and Bias tuning. |
|
|
|
## ๐ก Tutorial |
|
A detailed tutorial is available at our [document](https://llama2-accessory.readthedocs.io/en/latest/projects/mixtral-8x7b.html) |