aria-dev commited on
Commit
5c6db29
1 Parent(s): e2e1cb9

update readme

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -12,7 +12,9 @@ tags:
12
  <br>Aria</br>
13
  </p> -->
14
 
15
- This is a fork of the [rhymes-ai/Aria](https://huggingface.co/rhymes-ai/Aria) model. The primary modification is the replacement of [grouped GEMM](https://github.com/tgale96/grouped_gemm) with a sequential MLP. In this setup, each expert is a `torch.nn.Linear` layer executed sequentially. This change facilitates easier quantization using current open-source libraries, which are optimized to quantize `nn.Linear` layers.
 
 
16
 
17
 
18
  ## Quick Start
 
12
  <br>Aria</br>
13
  </p> -->
14
 
15
+ This is a fork of the [rhymes-ai/Aria](https://huggingface.co/rhymes-ai/Aria) model. The main modification involves replacing [grouped GEMM](https://github.com/tgale96/grouped_gemm) with a sequential MLP. In this configuration, each expert is implemented as a `torch.nn.Linear` layer executed in sequence. This adjustment simplifies quantization with current open-source libraries, which are optimized for `nn.Linear` layers.
16
+
17
+ While the sequential MLP approach aids in easier quantization, using grouped GEMM provides the advantage of faster inference speed.
18
 
19
 
20
  ## Quick Start