Extra slow when trying the instruction chat
#7
by
timxx
- opened
Compare to the official model and script, this one is too slow, and uses too many vram (uses as https://huggingface.co/blog/codellama descries)
Hi there! Do you have an example code snippet? You might be using the wrong inference type, make sure to load the model in float16
Thanks, add torch_dtype=torch.float16
fix the problem.
timxx
changed discussion status to
closed