https://hgpu.org/?p=2912
MCUDA: An Efficient Implementation of CUDA Kernels on Multi-cores