--- license: apache-2.0 datasets: - athirdpath/Merge_Glue --- ### TeeZee/NEBULA-XB-v1.0_SFT_2_epoch ### Experiment, can DUS be taken one or more steps further? ### Technical notes: - pretrained model NEBULA-XB-v1.0 finetuned on 30k entries from Merge_Glue dataset - 18 layers removed from both models of finetuned GALAXY-XB-v03 - model has 108 layers (((48-12)*2)-18)*2 = 108 - second step in scaling DUS procedure ### To evaluate - model performance after merge, should be a little lover that GALAXY finetuned on 50k of slimorca