https://hgpu.org/?p=8757
A Fast and Accurate GHT Implementation on CUDA