Fast inference engine

#2
by SinanAkkoyun - opened

Hello,
I understand why you can't use Llama, but please work on a vLLM PR when dropping a new architecture like DeepSeek does

Thank you

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment