Support of Xformer and FlashAttnention
#9
by
le723z
- opened
Hi, Thanks for the great work! I found the model so helpful for my ongoing project, may I ask do you have plan to add support of xformer for accelerated inference like how you have recently have done for gte-large-en-v1.5?
Best
Sorry,we do not have plan to support xformer for gte-Qwen2-1.5B-instruct. But this model has already support flash-attention, we beleieve they have similar inference speed.