Update README.md
Browse files
README.md
CHANGED
@@ -12,7 +12,7 @@ This is a hybrid architecture between self-attention based Transformer and [RetN
|
|
12 |
|
13 |
This is the model weight accompanying the paper [Cross-Architecture Transfer Learning for Linear-Cost Inference Transformers](https://arxiv.org/abs/2404.02684v1),
|
14 |
in which new Linear-Cost Inference models (e.g. RetNet) are not trained from scratch but transfer shared weight components from other PTLMs.
|
15 |
-
The model's input/output embeddings, MLP weights, Layer Norms
|
16 |
|
17 |
|
18 |
## Model Details
|
|
|
12 |
|
13 |
This is the model weight accompanying the paper [Cross-Architecture Transfer Learning for Linear-Cost Inference Transformers](https://arxiv.org/abs/2404.02684v1),
|
14 |
in which new Linear-Cost Inference models (e.g. RetNet) are not trained from scratch but transfer shared weight components from other PTLMs.
|
15 |
+
The model's input/output embeddings, MLP weights, & Layer Norms has been transferred from [pythia-1B](https://huggingface.co/EleutherAI/pythia-1b). For more detail, please refer to the paper.
|
16 |
|
17 |
|
18 |
## Model Details
|