https://hgpu.org/?p=10093
OpenCL API Extensions to achieve Multi-level Parallelism for Efficient Implementation of Strassen's Matrix Multiplication on GPUs