Wan Xinyi
Update description
29c9647
|
raw
history blame
1.93 kB

Zero Bubble Pipeline Parallelism

Zero Bubble Pipeline Parallelism is a novel pipeline parallelism algorithm able to reduce the bubble of pipeline parallelism to almost zero while preserving synchronous semantics.

Our paper is coming soon.

Try out our implementation based on Megatron on https://github.com/sail-sg/zero-bubble-pipeline-parallelism

Experiments shows zero bubble pipeline parallelism can accelerate training up to 30% with a similar memory comsumption. A detailed table of experiments is coming soon.

Zero Bubble Schedules

The key of achieving zero bubble is to breaking a backward pass into a B pass and W pass. B on one stage will only depend on the B on its next stage, compared to depending on both B and W of in 1F1B.

image

Comparision of Schedules

  • 1F1B image
  • ZB1P image
  • ZB2P image
  • ZBV - Each device is assigned to exactly 2 chunks (virtual stages), where white text colors represent the first chunk and black text colors represent the second chunk. The sequence of dependencies among model chunks follows a ”V” shape pattern for both the forward and backward passes. image
Comparison assuming T_F=T_B=T_W 1F1B ZB1P ZB2P ZBV (Recommended)
Bubble Rate (p-1)/m (p-1)/3m 0 0
Activation Memory
(Compared to 1F1B)
1x 1x 2x 1x
Pipeline Communication Volume
(Compared to 1F1B)
1x 1x 1x 2x