https://hgpu.org/?p=18417
Anatomy Of High-Performance Deep Learning Convolutions On SIMD Architectures