Clarification on Total Batch Size with 8xA100 GPUs

by songll - opened Nov 18

Nov 18

Compute
one 8xA100 machine

Batch size
Data parallel with a single gpu batch size of 8 for a total batch size of 32.

It’s stated that with data parallelism, each GPU handles a single GPU batch size of 8, leading to a total batch size of 32 across 8 GPUs. However, my understanding is that if each of the 8 GPUs processes a batch of 8 samples, the total batch size should logically be 8 * 8 = 64. Could someone please clarify why the total batch size is noted as 32 instead of 64?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment