T5XXL-Unchained
#6
by
woctordho
- opened
Would you consider using this text encoder in your model?
it's unnecessary to extend t5 vocab
the link you shared is just a method to extend t5 embedding vocabulary with random tensor
and it has to be thoroughly trained first.
this method is redundant because MMDiT is partially language model already (it's both language and image expert model)
so finetuning the MMDiT only is sufficient.
besides, T5 arch is numerically unstable because it's based on old transformer arch prior to numerically stable QK norm and attention scaling. I don't want to exacerbate the problem.