Clarification on Total Batch Size with 8xA100 GPUs
#3
by
songll
- opened
Compute
one 8xA100 machine
Batch size
Data parallel with a single gpu batch size of 8 for a total batch size of 32.
It’s stated that with data parallelism, each GPU handles a single GPU batch size of 8, leading to a total batch size of 32 across 8 GPUs. However, my understanding is that if each of the 8 GPUs processes a batch of 8 samples, the total batch size should logically be 8 * 8 = 64. Could someone please clarify why the total batch size is noted as 32 instead of 64?