high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » maxDNN: An Efficient Convolution Kernel for Deep Learning with Maxwell GPUs

maxDNN: An Efficient Convolution Kernel for Deep Learning with Maxwell GPUs

Andrew Lavin

eBay Research Labs Machine Learning

arXiv:1501.06633 [cs.NE], (27 Jan 2015)

@article{lavin2015maxdnn,

title={maxDNN: An Efficient Convolution Kernel for Deep Learning with Maxwell GPUs},

author={Lavin, Andrew},

year={2015},

month={jan},

archivePrefix={"arXiv"},

primaryClass={cs.NE}

}

View

Source

Package:

3850

views

This paper describes maxDNN, a computationally efficient convolution kernel for deep learning with the NVIDIA Maxwell GPU. maxDNN reaches 96.3% computational efficiency on typical deep learning network architectures using a single kernel. The design combines ideas from cuda-convnet2 with the Maxas SGEMM assembly code. We only address forward propagation (FPROP) operation of the network, but we believe that the same techniques used here will be effective for backward propagation (BPROP) as well.

Tags: Computer science, CUDA, Machine learning, Neural and Evolutionary Computing, nVidia, nVidia GeForce GTX 980, Package

January 28, 2015 by hgpu

Rating: 2.4/5. From 5 votes.

Please wait...

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

SimSYCL: A SYCL Implementation Targeting Development, Debugging, Simulation and Conformance

GPU plugin for PySCF

Python-Based Quantum Chemistry Calculations with GPU Acceleration

QArray

QArray: a GPU-accelerated constant capacitance model simulator for large quantum dot arrays

Celerity: High-level C++ for Accelerator Clusters

Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

94% on CIFAR-10 in 3.29 Seconds on a Single GPU

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

LOOPer: A Learned Automatic Code Optimizer For Polyhedral Compilers

OpenMC Monte Carlo Code

Performance Portable Monte Carlo Particle Transport on Intel, NVIDIA, and AMD GPUs

Polygeist: C/C++ frontend for MLIR

Retargeting and Respecializing GPU Workloads for Performance Portability

Parallel Gaussian process with kernel approximation in CUDA

Parallel Gaussian process with kernel approximation in CUDA

See all packages

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Login | Sitemap | Feedback | Policy

Contact us: