Text Generation
Transformers
PyTorch
TensorBoard
Safetensors
bloom
Eval Results
text-generation-inference
Inference Endpoints

How large is Bloom exactly to load all the checkpoints into gpu ram?

#127
by mishavee - opened
  1. How large is Bloom exactly to load all the checkpoints into gpu ram?

  2. How large of gpu ram would be needed to load all the checkpoints and fine tune it?

BigScience Workshop org

How large is Bloom exactly to load all the checkpoints into gpu ram?

You need 352G of GPU ram to load the weights in bfloat16 in GPUs.

How large of gpu ram would be needed to load all the checkpoints and fine tune it?

You never need to load all the checkpoints at once ... if you want to finetune you have to take in account optimizer states. Luckily you can try using DeepSpeed zero offload, it essentially moves the memory footprint to other spaces (either CPU RAM or Disk). @stas has written a great documentation about how to use it in transformers https://huggingface.co/docs/transformers/main_classes/deepspeed

so what is the least amount of A100 80gb gpus I need if I use deepspeed zero offload?

BigScience Workshop org

so what is the least amount of A100 80gb gpus I need if I use deepspeed zero offload?

The very minimum is probably going to be 1 A100. It's going to be very slow, but it's going to run. Offloading just means that it's going to use the CPU memory / disk space as additional memory so that you're not going to go out of memory.

Sign up or log in to comment