sheldonrobinson
/

Aria-sequential_mlp

Image-Text-to-Text

text-generation

Model card Files Files and versions Community

aria-dev commited on 19 days ago

Commit

5c6db29

•

1 Parent(s): e2e1cb9

update readme

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -12,7 +12,9 @@ tags:
   <br>Aria</br>
 </p>  -->
-This is a fork of the [rhymes-ai/Aria](https://huggingface.co/rhymes-ai/Aria) model. The primary modification is the replacement of [grouped GEMM](https://github.com/tgale96/grouped_gemm) with a sequential MLP. In this setup, each expert is a `torch.nn.Linear` layer executed sequentially. This change facilitates easier quantization using current open-source libraries, which are optimized to quantize `nn.Linear` layers.
 ## Quick Start

   <br>Aria</br>
 </p>  -->
+This is a fork of the [rhymes-ai/Aria](https://huggingface.co/rhymes-ai/Aria) model. The main modification involves replacing [grouped GEMM](https://github.com/tgale96/grouped_gemm) with a sequential MLP. In this configuration, each expert is implemented as a `torch.nn.Linear` layer executed in sequence. This adjustment simplifies quantization with current open-source libraries, which are optimized for `nn.Linear` layers.
+While the sequential MLP approach aids in easier quantization, using grouped GEMM provides the advantage of faster inference speed.
 ## Quick Start