TeeZee's picture
Create README.md
6c859be verified
|
raw
history blame
535 Bytes
metadata
license: apache-2.0
datasets:
  - athirdpath/Merge_Glue

TeeZee/NEBULA-XB-v1.0_SFT_2_epoch

Experiment, can DUS be taken one or more steps further?

Technical notes:

  • pretrained model NEBULA-XB-v1.0 finetuned on 30k entries from Merge_Glue dataset
  • 18 layers removed from both models of finetuned GALAXY-XB-v03
  • model has 108 layers (((48-12)*2)-18)*2 = 108
  • second step in scaling DUS procedure

To evaluate

  • model performance after merge, should be a little lover that GALAXY finetuned on 50k of slimorca