https://hgpu.org/?p=8865
OWL: Cooperative Thread Array Aware Scheduling Techniques for Improving GPGPU Performance