high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » BrainSlug: Transparent Acceleration of Deep Learning Through Depth-First Parallelism

BrainSlug: Transparent Acceleration of Deep Learning Through Depth-First Parallelism

Nicolas Weber, Florian Schmidt, Mathias Niepert, Felipe Huici

NEC Laboratories Europe

arXiv:1804.08378 [cs.DC], (23 Apr 2018)

BibTeX

Download (PDF)

View

Source

3009

views

Project page: BrainSlug: Transparent Neural Network Acceleration (http://www.brainslug.info/)

Neural network frameworks such as PyTorch and TensorFlow are the workhorses of numerous machine learning applications ranging from object recognition to machine translation. While these frameworks are versatile and straightforward to use, the training of and inference in deep neural networks is resource (energy, compute, and memory) intensive. In contrast to recent works focusing on algorithmic enhancements, we introduce BrainSlug, a framework that transparently accelerates neural network workloads by changing the default layer-by-layer processing to a depth-first approach, reducing the amount of data required by the computations and thus improving the performance of the available hardware caches. BrainSlug achieves performance improvements of up to 41.1% on CPUs and 35.7% on GPUs. These optimizations come at zero cost to the user as they do not require hardware changes and only need tiny adjustments to the software.

Tags: AI, Artificial intelligence, cache, CNN, Computer science, cpu, CUDA, Deep learning, GPU, Machine learning, Neural and Evolutionary Computing, nVidia, nVidia GeForce GTX 1080 Ti

April 25, 2018 by hgpu

Rating: 5.0/5. From 2 votes.

Please wait...

high performance computing on graphics processing units: hgpu.org

BrainSlug: Transparent Acceleration of Deep Learning Through Depth-First Parallelism

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

BrainSlug: Transparent Acceleration of Deep Learning Through Depth-First Parallelism

Share this:

Recent source codes

Most viewed papers (last 30 days)