GGUF
I would be grateful if you could explain why there is still no GGUF version of the model. I mean, BNB 4-bit doesn't work on a consumer-grade 24GB GPU. It would be interesting to see Q3 or I3 versions. It appears that GGUF versions have been made for the Qwen 2 VL model. Could there be some technical challenges preventing the creation of a GGUF version for 2.5?
Actually, several llama.cpp repositories have already implemented the Qwen2.5-VL architecture and previously uploaded GGUF files. However, some users deployed these GGUF files to applications like the official llama.cpp, LM Studio, or Ollama, which do not yet support this architecture. Moreover, these users have raised some issues, which have been quite annoying for the developers, leading them to shut down the GGUF model repositories. If you want to test it, you can find these forked llama.cpp repositories on GitHub and build a GGUF file yourself for testing.