https://hgpu.org/?p=16942
Deep Convolutional Network evaluation on the Intel Xeon Phi: Where Subword Parallelism meets Many-Core