Nested Data-Parallelism on the GPU

hgpu.org » Programming » Algorithms » Nested Data-Parallelism on the GPU

Nested Data-Parallelism on the GPU

Lars Bergstrom, John Reppy

University of Chicago

University of Chicago, 2012

@article{bergstrom2012nested,

title={Nested Data-Parallelism on the GPU},

author={Bergstrom, L. and Reppy, J.},

year={2012}

}

Download (PDF)

View

Source

Source codes

Package:

NDP2GPU

1704

views

Graphics processing units (GPUs) provide both memory bandwidth and arithmetic performance far greater than that available on CPUs, but, because of their Single-Instruction-Multiple-Data (SIMD) architecture, they are hard to program. Most of the programs ported to GPUs thus far use traditional data-level parallelism, performing only operations that operate uniformly over vectors. Porting algorithms that do not easily fit this model requires laborious manual program conversion to run programs with irregular control-flow or data layout. Language support for programming GPUs is rather limited. The low-level GPU languages, such as CUDA and OpenCL, appear general-purpose, but must be used in restricted styles to get adequate performance. Some effort has been made to support computation on GPUs from high-level languages, such as C++, X10, and Haskell, but these mechanisms are limited to computations over flat vectors. NESL is a first-order functional language that was designed by Guy Blelloch to allow programmers to write irregular-parallel programs – such as parallel divide-and-conquer algorithms – for wide-vector parallel computers. NESL supports nested data parallelism (NDP), where the elements of a data-parallel computation can themselves be data-parallel computations, and it uses a novel compilation technique, called flattening, to map nested computations onto a SIMD execution model. This work demonstrated that NDP can be effectively used for a wide range of irregular-parallel computations. Wide-vector machines pose many of the same programming challenges as GPUs, so it is worth exploring the question of whether an NDP language, like NESL, can make an effective tool for programming GPUs. This paper presents the results of such an exploration. We describe our port of the NESL implementation to work on GPUs and present empirical evidence that NDP on GPUs significantly outperforms CPU-based implementations and matches or beats newer GPU languages that support only flat parallelism. We also compare our performance with hand-tuned CUDA programs. While our performance does not match that of handtuned codes, we argue that the notational conciseness of NESL is worth the loss in performance. This work provides the first language implementation that directly supports NDP on a GPU.

Tags: Algorithms, Compilers, Computer science, CUDA, Data parallelism, nVidia, Package, Programming Languages, Tesla C2050

March 31, 2012 by hgpu

Rating: 1.5/5. From 2 votes.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org