high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » On the Use of Small 2D Convolutions on GPUs

On the Use of Small 2D Convolutions on GPUs

Shams A.H. Al Umairy, Alexander S. van Amesfoort, Irwan D. Setija, Martijn C. van Beurden, Henk J. Sips

Delft University of Technology, Delft, The Netherlands

Computer Architecture, Lecture Notes in Computer Science, Volume 6161/2012, 52-64, 2012

DOI:10.1007/978-3-642-24322-6_6

BibTeX

Download (PDF)

View

Source

1834

views

Computing many small 2D convolutions using FFTs is a basis for a large number of applications in many domains in science and engineering, among them electromagnetic diffraction modeling in physics. The GPU architecture seems to be a suitable architecture to accelerate these convolutions, but reaching high application performance requires substantial development time and non-portable optimizations. In this work, we present the techniques, performance results and considerations to accelerate small 2D convolutions using CUDA, and compare performance to a multi-threaded CPU implementation. To improve programmability and performance of applications that make heavy use of small convolutions, we argue that two improvements to software and hardware are needed: FFT libraries must be extended with a single convolution function and communication bandwidth between CPU and GPU needs to be drastically improved.

Tags: Computer science, CUDA, Electrodynamics, FFT, nVidia, nVidia GeForce 8800 GTX, nVidia GeForce GTX 280, Optimization, Physics, Tesla C1060

March 15, 2012 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

On the Use of Small 2D Convolutions on GPUs

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Most viewed papers (last 30 days)

On the Use of Small 2D Convolutions on GPUs

Share this:

Recent source codes

Most viewed papers (last 30 days)