desc_act?
#5
by
jackboot
- opened
I notice that 128g is used without desc_act. In oobaboga tests it was proven that group size alone caused higher perplexity than no grouping at all. I realize that using groups + desc_act makes the model less compatible with autogptq and other hardware, but this would be the worst of both worlds. The higher memory usage and the higher perplexity.
I think that uploading one model with only desc_act and no groups and another model with desc_act + 128g would be ideal to cover both cases and still have some benefit.
What do you think? I know it's not a big difference and I'm happy this exists at all.