https://hgpu.org/?p=18172
BrainSlug: Transparent Acceleration of Deep Learning Through Depth-First Parallelism