https://hgpu.org/?p=17895
An In-depth Performance Characterization of CPU- and GPU-based DNN Training on Modern Architectures