https://hgpu.org/?p=2938
Performance analysis and optimization of three-dimensional FDTD on GPU using roofline model