TheBloke/guanaco-65B-GPTQ · how to fine tune guanaco and run on multi core cpu (>32 cores)?

To use QLoRA you should start with my fp16 repo of this model, TheBloke/guanaco-65B-HF. You can't run QLoRA on an already-quantised model like this.

At least not yet. There is a PR in the AutoGPTQ repo to add PEFT/LoRA support to AutoGPTQ. That would allow fine tuning on GPTQ models. But I don't think it's quite finished yet. Check the Pull Requests on the AutoGPTQ repo for more info.

As to running QLoRA from TheBloke/guanaco-65B-HF: I don't yet have experience of using QLoRA personally, but here's an introductory video: https://youtu.be/8vmWGX1nfNM

If you have further questions, then come to my Discord. Aemon, the creator of that video, is on my Discord and will be willing to help further if he's around.