https://hgpu.org/?p=17747
Acceleration of tensor-product operations for high-order finite element methods