GPU Memory increase after each prompt

#7
by thanhpt93 - opened

Hi, I loaded model with 15Gb ram, but after each prompt, gpu memory increase. How to release memory? thank you very much!

I suggest you calling torch.cuda.empty_cache() after every 2-3 prompts. It may not be the most effective solution, especially if the questions are not connected and the model needs to maintain context. Using a stronger GPU like the A100, A40, or L40 is a much better approach.

@Tamnemtf Thank for your solution, it work

thanhpt93 changed discussion status to closed

Sign up or log in to comment