|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- athirdpath/Merge_Glue |
|
--- |
|
|
|
### TeeZee/NEBULA-XB-v1.0_SFT_2_epoch ### |
|
|
|
Experiment, can DUS be taken one or more steps further? |
|
|
|
|
|
### Technical notes: |
|
- pretrained model NEBULA-XB-v1.0 finetuned on 30k entries from Merge_Glue dataset |
|
- 18 layers removed from both models of finetuned GALAXY-XB-v03 |
|
- model has 108 layers (((48-12)*2)-18)*2 = 108 |
|
- second step in scaling DUS procedure |
|
|
|
|
|
### To evaluate |
|
- model performance after merge, should be a little lover that GALAXY finetuned on 50k of slimorca |