QPHutu's picture
minor change
abf8fd0

A newer version of the Gradio SDK is available: 5.6.0

Upgrade

Pipeline Parallellism with Controllable Memory

Pipeline Parallelism with Controllable Memory creates a framework on designing pipeline schedules and uses the framework to find memory optimal efficient schedules.

From our findings, we need approximately 1/3 memory under ideal conditions (F, B and W have same runtime), and 1/2 memory to create zero bubble schedule in realistic scenarios (with the necessary condition being W + 2B ≥ 2F and W + 2F ≥ 2B ).

Check out our paper at Arxiv.

Method 1F1B V-Min V-Half V-ZB
Bubble Rate
(assuming T_F=T_B=T_W)
~ p/m ~ 2p/3m ~ p/ 2m 0
Activation Memory
(by #micro-batch)
p (p+4)//3 (p+2)//2 p

Bubble Rate here is calculated as 1 - (F+B+W)*m / longest_stage_time.