Wan Xinyi
Update description
29c9647
|
raw
history blame
1.93 kB
# Zero Bubble Pipeline Parallelism
Zero Bubble Pipeline Parallelism is a novel pipeline parallelism algorithm able to reduce the bubble of pipeline parallelism to almost zero while preserving synchronous semantics.
Our paper is coming soon.
Try out our implementation based on Megatron on [https://github.com/sail-sg/zero-bubble-pipeline-parallelism](https://github.com/sail-sg/zero-bubble-pipeline-parallelism)
Experiments shows zero bubble pipeline parallelism can accelerate training up to 30% with a similar memory comsumption. A detailed table of experiments is coming soon.
## Zero Bubble Schedules
The key of achieving zero bubble is to breaking a backward pass into a B pass and W pass. B on one stage will only depend on the B on its next stage, compared to depending on both B and W of in 1F1B.
![image](https://hackmd.io/_uploads/Bkc7CL7N6.png)
### Comparision of Schedules
* 1F1B
![image](https://hackmd.io/_uploads/Hkq-gD7N6.png)
* ZB1P
![image](https://hackmd.io/_uploads/Hy2GxwmEa.png)
* ZB2P
![image](https://hackmd.io/_uploads/S10QgvmV6.png)
* ZBV - Each device is assigned to exactly 2 chunks (virtual stages), where white text colors represent the first chunk and black text colors represent the second chunk. The sequence of dependencies among model chunks follows a ”V” shape pattern for both the forward and backward passes.
![image](https://hackmd.io/_uploads/Sk9uyY4ra.png)
| Comparison assuming T_F=T_B=T_W | 1F1B | ZB1P | ZB2P | ZBV (Recommended) |
| ----------------------------------------------------- | ------- | -------- | ---- | --- |
| Bubble Rate | (p-1)/m | (p-1)/3m | 0 | 0 |
| Activation Memory <br> (Compared to 1F1B) | 1x | 1x | 2x | 1x |
| Pipeline Communication Volume <br> (Compared to 1F1B) | 1x | 1x | 1x | 2x |