Marcel Bischoff
commited on
Commit
·
b3c4e71
1
Parent(s):
7730b56
README
Browse files
README.md
CHANGED
|
@@ -17,10 +17,16 @@ tags:
|
|
| 17 |
|
| 18 |

|
| 19 |
|
| 20 |
-
# phixtral-4x2_8
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
|
| 22 |
phixtral-4x2_8 is the first Mixure of Experts (MoE) made with four [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) models, inspired by the [mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) architecture. It performs better than each individual expert.
|
| 23 |
|
|
|
|
|
|
|
| 24 |
## 🏆 Evaluation
|
| 25 |
|
| 26 |
| Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
|
|
@@ -109,4 +115,4 @@ A special thanks to [vince62s](https://huggingface.co/vince62s) for the inferenc
|
|
| 109 |
|
| 110 |
Thanks to [Charles Goddard](https://github.com/cg123) for the [mergekit](https://github.com/cg123/mergekit) library and the implementation of the [MoE for clowns](https://goddard.blog/posts/clown-moe/).
|
| 111 |
|
| 112 |
-
Thanks to [ehartford](https://huggingface.co/ehartford), [lxuechen](https://huggingface.co/lxuechen), [Yhyu13](https://huggingface.co/Yhyu13), and [mrm8488](https://huggingface.co/mrm8488) for their fine-tuned phi-2 models.
|
|
|
|
| 17 |
|
| 18 |

|
| 19 |
|
| 20 |
+
# phixtral-4x2_8-gates-poc
|
| 21 |
+
phixtral-4x2_8-gates-poc is [phixtral-4x2_8](https://huggingface.co/mlabonne/phixtral-4x2_8)
|
| 22 |
+
with finetuned gates for better selection of Expert and to break the symmetry.
|
| 23 |
+
As a POC we only used 400 shorter samples
|
| 24 |
+
from [openhermes](https://huggingface.co/datasets/teknium/openhermes).
|
| 25 |
|
| 26 |
phixtral-4x2_8 is the first Mixure of Experts (MoE) made with four [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) models, inspired by the [mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) architecture. It performs better than each individual expert.
|
| 27 |
|
| 28 |
+
|
| 29 |
+
|
| 30 |
## 🏆 Evaluation
|
| 31 |
|
| 32 |
| Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
|
|
|
|
| 115 |
|
| 116 |
Thanks to [Charles Goddard](https://github.com/cg123) for the [mergekit](https://github.com/cg123/mergekit) library and the implementation of the [MoE for clowns](https://goddard.blog/posts/clown-moe/).
|
| 117 |
|
| 118 |
+
Thanks to [ehartford](https://huggingface.co/ehartford), [lxuechen](https://huggingface.co/lxuechen), [Yhyu13](https://huggingface.co/Yhyu13), and [mrm8488](https://huggingface.co/mrm8488) for their fine-tuned phi-2 models.
|