https://hgpu.org/?p=18943
High-Performance Deep Learning via a Single Building Block