Alpha-VLLM
/

MoE-Mixtral-7B-8Expert

Model card Files Files and versions Community

MoE-Mixtral-7B-8Expert / README.md

Cxxs's picture

Update README.md

b80a43a about 1 year ago

|

1.12 kB

	---
	license: mit
	---
	# 🔥 MoE-Mixtral-7B-8Expert
	[mixtral-8x7b](https://huggingface.co/someone13574/mixtral-8x7b-32kseqlen) is a Mixture-of-Expert (MoE) model.
	[LLaMA2-Accessory](https://github.com/Alpha-VLLM/LLaMA2-Accessory) has supported its inference and finetuning.

	## 🚀 Features
	With LLaMA2-Accessory, mixtral-8x7b enjoys the following features:
	1. Distributed MoE (namely instantiating experts on multiple processes/gpus)
	2. Load Balancing Loss
	3. Tensor Parallel and FSDP for efficiently training
	4. Distributed and/or quantized inference

	## 🔥 Online Demo
	We host a web demo at <https://5e1109637f49baae47.gradio.live/>, which shows a mixtral-8x7b model finetuned on
	[evol-codealpaca-v1](https://huggingface.co/datasets/theblackcat102/evol-codealpaca-v1) and
	[ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k), with LoRA and Bias tuning.
	Please note that this is a temporary link, and we will update our official permanent link today.

	## 💡 Tutorial
	A detailed tutorial is available at <https://llama2-accessory.readthedocs.io/en/latest/projects/mixtral-8x7b.html#>