Hello, can I run this model if I only have a 3090 with 24gVram and 32gRam ?
#1
by
Humeee33
- opened
Is it that no one reads these help request messages or no one cares, no one knows, or a combination?
@humeee33 well I wouldnt recommend using the gptq one since you dont have enough vram for a 70b model. Your best bet is to use llama.cpp(or llama cpp python) and download the gguf version instead of gptq.
You might be able to run it with transformers but it will be extremely slow while llama.cpp will be much faster