https://hgpu.org/?p=25445
Parallel time integration using Batched BLAS (Basic Linear Algebra Subprograms) routines