Fast inference engine

by SinanAkkoyun - opened Jul 20, 2024

Jul 20, 2024

Hello,
I understand why you can't use Llama, but please work on a vLLM PR when dropping a new architecture like DeepSeek does

Thank you

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment