https://hgpu.org/?p=4854
A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads