TheBloke
/

Llama-2-7B-32K-Instruct-GPTQ

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions Community

Resources

View closed (0)

OOM when quantizing for 32k context length

#3 opened almost 2 years ago by

Code is looking for 'modeling_flash_llama.py' on huggingface even though I have it in local folder

#2 opened almost 2 years ago by

Fine tuning this model further

#1 opened almost 2 years ago by