Clarification on Total Batch Size with 8xA100 GPUs

#3
by songll - opened

Compute
one 8xA100 machine

Batch size
Data parallel with a single gpu batch size of 8 for a total batch size of 32.

It’s stated that with data parallelism, each GPU handles a single GPU batch size of 8, leading to a total batch size of 32 across 8 GPUs. However, my understanding is that if each of the 8 GPUs processes a batch of 8 samples, the total batch size should logically be 8 * 8 = 64. Could someone please clarify why the total batch size is noted as 32 instead of 64?

Sign up or log in to comment