high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » A comparative study of GPU programming models and architectures using neural networks

A comparative study of GPU programming models and architectures using neural networks

Vivek Pallipuram, Mohammad Bhuiyan, Melissa Smith

Department of Electrical and Computer Engineering, Clemson University, Clemson, SC 29634, USA

The Journal of Supercomputing (31 May 2011), pp. 1-46.

DOI:10.1007/s11227-011-0631-3

@article{pallipuramcomparative,

title={A comparative study of GPU programming models and architectures using neural networks},

author={Pallipuram, V.K. and Bhuiyan, M. and Smith, M.C.},

journal={The Journal of Supercomputing},

pages={1–46},

publisher={Springer},

year={2011}

}

Source

2295

views

Recently, General Purpose Graphical Processing Units (GP-GPUs) have been identified as an intriguing technology to accelerate numerous data-parallel algorithms. Several GPU architectures and programming models are beginning to emerge and establish their niche in the High-Performance Computing (HPC) community. New massively parallel architectures such as the Nvidia’s Fermi and AMD/ATi’s Radeon pack tremendous computing power in their large number of multiprocessors. Their performance is unleashed using one of the two GP-GPU programming models: Compute Unified Device Architecture (CUDA) and Open Computing Language (OpenCL). Both of them offer constructs and features that have direct bearing on the application runtime performance. In this paper, we compare the two GP-GPU architectures and the two programming models using a two-level character recognition network. The two-level network is developed using four different Spiking Neural Network (SNN) models, each with different ratios of computation-to-communication requirements. To compare the architectures, we have chosen the two extremes of the SNN models for implementation of the aforementioned two-level network. An architectural performance comparison of the SNN application running on Nvidia’s Fermi and AMD/ATi’s Radeon is done using the OpenCL programming model exhausting all of the optimization strategies plausible for the two architectures. To compare the programming models, we implement the two-level network on Nvidia’s Tesla C2050 based on the Fermi architecture. We present a hierarchy of implementations, where we successively add optimization techniques associated with the two programming models. We then compare the two programming models at these different levels of implementation and also present the effect of the network size (problem size) on the performance. We report significant application speed-up, as high as 1095x for the most computation intensive SNN neuron model, against a serial implementation on the Intel Core 2 Quad host. A comprehensive study presented in this paper establishes connections between programming models, architectures and applications.

Tags: ATI, Computer science, CUDA, Neural networks, nVidia, OpenCL, Performance, Tesla C2050

June 14, 2011 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

A comparative study of GPU programming models and architectures using neural networks

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

A comparative study of GPU programming models and architectures using neural networks

Share this:

Recent source codes

Most viewed papers (last 30 days)