https://hgpu.org/?p=5651
Register packing for cyclic reduction: a case study