Extra slow when trying the instruction chat

by timxx - opened Aug 28, 2023

Aug 28, 2023

Compare to the official model and script, this one is too slow, and uses too many vram (uses as https://huggingface.co/blog/codellama descries)

Aug 29, 2023

Hi there! Do you have an example code snippet? You might be using the wrong inference type, make sure to load the model in float16

timxx

Aug 30, 2023

Thanks, add torch_dtype=torch.float16 fix the problem.

timxx changed discussion status to closed Aug 30, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment