update readme
Browse files
README.md
CHANGED
@@ -12,7 +12,7 @@ tags:
|
|
12 |
<br>Aria</br>
|
13 |
</p> -->
|
14 |
|
15 |
-
This is a fork of the [rhymes-ai/Aria](https://huggingface.co/rhymes-ai/Aria) model. The
|
16 |
|
17 |
While the sequential MLP approach aids in easier quantization, using grouped GEMM provides the advantage of faster inference speed.
|
18 |
|
|
|
12 |
<br>Aria</br>
|
13 |
</p> -->
|
14 |
|
15 |
+
This is a fork of the [rhymes-ai/Aria](https://huggingface.co/rhymes-ai/Aria) model. The only modification is replacing [grouped GEMM](https://github.com/tgale96/grouped_gemm) with a sequential MLP. In this configuration, each expert is implemented as a `torch.nn.Linear` layer executed in sequence. This adjustment simplifies quantization with current open-source libraries, which are optimized for `nn.Linear` layers.
|
16 |
|
17 |
While the sequential MLP approach aids in easier quantization, using grouped GEMM provides the advantage of faster inference speed.
|
18 |
|