https://hgpu.org/?p=27077
COX: Exposing CUDA Warp-Level Functions to CPUs