Cxxs's picture
Update README.md
b80a43a
|
raw
history blame
1.12 kB
metadata
license: mit

πŸ”₯ MoE-Mixtral-7B-8Expert

mixtral-8x7b is a Mixture-of-Expert (MoE) model. LLaMA2-Accessory has supported its inference and finetuning.

πŸš€ Features

With LLaMA2-Accessory, mixtral-8x7b enjoys the following features:

  1. Distributed MoE (namely instantiating experts on multiple processes/gpus)
  2. Load Balancing Loss
  3. Tensor Parallel and FSDP for efficiently training
  4. Distributed and/or quantized inference

πŸ”₯ Online Demo

We host a web demo at https://5e1109637f49baae47.gradio.live/, which shows a mixtral-8x7b model finetuned on evol-codealpaca-v1 and ultrachat_200k, with LoRA and Bias tuning. Please note that this is a temporary link, and we will update our official permanent link today.

πŸ’‘ Tutorial

A detailed tutorial is available at https://llama2-accessory.readthedocs.io/en/latest/projects/mixtral-8x7b.html#