high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Computer vision » Optimizing Memory Efficiency for Convolution Kernels on Kepler GPUs

Optimizing Memory Efficiency for Convolution Kernels on Kepler GPUs

Xiaoming Chen, Jianxu Chen, Danny Z. Chen, Xiaobo Sharon Hu

Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, USA

arXiv:1705.10591 [cs.DC], (29 May 2017)

@article{chen2017optimizing,

title={Optimizing Memory Efficiency for Convolution Kernels on Kepler GPUs},

author={Chen, Xiaoming and Chen, Jianxu and Chen, Danny Z. and Hu, Xiaobo Sharon},

year={2017},

month={may},

archivePrefix={"arXiv"},

primaryClass={cs.DC}

}

Download (PDF)

View

Source

2479

views

Convolution is a fundamental operation in many applications, such as computer vision, natural language processing, image processing, etc. Recent successes of convolutional neural networks in various deep learning applications put even higher demand on fast convolution. The high computation throughput and memory bandwidth of graphics processing units (GPUs) make GPUs a natural choice for accelerating convolution operations. However, maximally exploiting the available memory bandwidth of GPUs for convolution is a challenging task. This paper introduces a general model to address the mismatch between the memory bank width of GPUs and computation data width of threads. Based on this model, we develop two convolution kernels, one for the general case and the other for a special case with one input channel. By carefully optimizing memory access patterns and computation patterns, we design a communication-optimized kernel for the special case and a communication-reduced kernel for the general case. Experimental data based on implementations on Kepler GPUs show that our kernels achieve 5.16X and 35.5% average performance improvement over the latest cuDNN library, for the special case and the general case, respectively.

Tags: CNN, Computer science, Computer vision, CUDA, Deep learning, Neural networks, nVidia, Tesla K40

June 1, 2017 by hgpu

Rating: 2.5/5. From 1 vote.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Optimizing Memory Efficiency for Convolution Kernels on Kepler GPUs

Your response

Recent source codes

Awesome LLM-Driven Kernel Generation

PhysProver: Advancing Automatic Theorem Proving for Physics

ParaCodex: A Profiling-Guided Autonomous Coding Agent for Reliable Parallel Code Generation and Translation

SeedFold: Scaling Biomolecular Structure Prediction

Tilus: A Tile-Level GPU Kernel Programming Language

Memory-Efficient Acceleration of Block Low-Rank Foundation Models on Resource Constrained GPUs

BoltzGen:Toward Universal Binder Design

CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning

cuPilot: A Strategy-Coordinated Multi-agent Framework for CUDA Kernel Evolution

MATLAB Tensor Core models

Most viewed papers (last 30 days)

Optimizing Memory Efficiency for Convolution Kernels on Kepler GPUs

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)