https://hgpu.org/?p=5293
Accelerating tetrahedral interpolation with data-level and Thread-Level Parallel optimization