converted gguf format model is so slow on inference( is that right?)
#1
by
bangbang
- opened
I use KoLLaVA-Synatra-7b by converting gguf format. that gguf model so slow... that i thought i coludn't use this. (못쓸정도로 느립니다.)
I want you to tell me this model slow is true????
How did you quantize it? like Q8_0, Q 4_K_M