Extra slow when trying the instruction chat

#7
by timxx - opened

Compare to the official model and script, this one is too slow, and uses too many vram (uses as https://huggingface.co/blog/codellama descries)

Hi there! Do you have an example code snippet? You might be using the wrong inference type, make sure to load the model in float16

Thanks, add torch_dtype=torch.float16 fix the problem.

timxx changed discussion status to closed

Sign up or log in to comment