1
LLM Memory Calculator
🌍
Estimate the GPU memory needed to run inference with any LLM
You can try my Colab Notebook which let's you estimate model weights memory requirements (credit: Philip Schmid) and kv cache memory requirements: https://colab.research.google.com/drive/16m3Fz4Ocgp-saXUFNRa8zXyXA-y9JigX?usp=sharing
Hi aHaric! I also think the 405b numbers are in FP32. According to my calculations, 405b at 128k should be in the ~66gb ballpark