Commit
·
dda7081
1
Parent(s):
1d86275
Update README.md
Browse files
README.md
CHANGED
@@ -6,6 +6,8 @@ tags:
|
|
6 |
- merge
|
7 |
---
|
8 |
|
|
|
|
|
9 |
This model is an experiment involving mixing DARE TIE merger with a task arithmetic merger to attempt to merge models with less loss.
|
10 |
|
11 |
DARE TIE mergers are [very strong at transferring strengths](https://medium.com/@minh.hoque/paper-explained-language-models-are-super-mario-2ebce6c2cf35) while merging a minimal part of the model. For larger models, 90-99% of delta parameters from SFT models can be dropped while retaining most of the benefits if they are rescaled and consensus merged back into the model.
|
|
|
6 |
- merge
|
7 |
---
|
8 |
|
9 |
+
**Update: Yeah, this strategy doesn't work. This ended up really devastating the model's performance.**
|
10 |
+
|
11 |
This model is an experiment involving mixing DARE TIE merger with a task arithmetic merger to attempt to merge models with less loss.
|
12 |
|
13 |
DARE TIE mergers are [very strong at transferring strengths](https://medium.com/@minh.hoque/paper-explained-language-models-are-super-mario-2ebce6c2cf35) while merging a minimal part of the model. For larger models, 90-99% of delta parameters from SFT models can be dropped while retaining most of the benefits if they are rescaled and consensus merged back into the model.
|