https://hgpu.org/?p=28410
Improving Automatic Parallel Training via Balanced Memory Workload Optimization