high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » An Environment to Support GPU and Multicore Programming for Rapid, High Performance, Application Deployment

An Environment to Support GPU and Multicore Programming for Rapid, High Performance, Application Deployment

James Laurence Brock

The Department of Electrical and Computer Engineering, Northeastern University, Boston, Massachusetts

Northeastern University, 2012

@phdthesis{brock2012environment,

title={An Environment to Support GPU and Multicore Programming for Rapid, High Performance, Application Deployment},

author={Brock, J.L.},

school={NORTHEASTERN UNIVERSITY},

year={2012}

}

Download (PDF)

View

Source

2088

views

Homogeneous multicore processors, heterogeneous multicore processors, high performance accelerators, and other heterogeneous architectures have significant computing potential over traditional single core processors. Computer systems comprised of these specialized processing elements are increasingly common. Due to the increased complexity of these architectures, programming for them has become increasingly complex and error prone. Each of these architectures have different memory systems, programming languages and development environments. This has driven the need for portable programming APIs and tools that allow developers to easily exploit all of the computational power of these platforms and effortlessly move their programs between different computing systems. To deal with these challenges MIT Lincoln Laboratory developed the Parallel Vector Tile Optimizing Library (PVTOL) to simplify the task of portable programming for complex systems. The PVTOL Tasks and Conduits framework provides a set of high-level programming constructs for writing high performance code that is portable across a range of traditional and heterogeneous architectures. This research extends PVTOL to include support for Graphics Processing Units (GPUs) and heterogeneous computing architectures using both the NVIDIA Compute Unified Device Architecture (CUDA) and Open Compute Language (OpenCL), while maintaining simplicity of programming and portability. We have demonstrated the utility of this framework by porting both a quantum Monte Carlo simulation and 3D cone beam image reconstruction application to different systems consisting of various heterogeneous architectures. These applications have been ported from single CPU/GPU systems up to heterogeneous cluster architectures with as many as 24 nodes containing GPUs, showing significant speed up and scalability with minimal devleper effort. Using this framework, we have achieved total application run time speed ups of quantum Monte Carlo simulations of 115x on 24 distributed GPU nodes and speed ups of 3D cone beam image reconstruction of 315x on 16 distributed GPU nodes compared to multithreaded C code.

Tags: ATI, ATI Radeon HD 5870, Computer science, CUDA, Heterogeneous systems, Image reconstruction, Monte Carlo simulation, nVidia, nVidia GeForce 9800 GX2, nVidia GeForce GTX 560 Ti, OpenCL, Tesla C1060, Tesla S1070, Thesis

October 26, 2012 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

An Environment to Support GPU and Multicore Programming for Rapid, High Performance, Application Deployment

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)

An Environment to Support GPU and Multicore Programming for Rapid, High Performance, Application Deployment

Share this:

Recent source codes

Most viewed papers (last 30 days)