high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Nested Data-Parallelism on the GPU

Nested Data-Parallelism on the GPU

Lars Bergstrom, John Reppy

University of Chicago

University of Chicago, 2012

@article{bergstrom2012nested,

title={Nested Data-Parallelism on the GPU},

author={Bergstrom, L. and Reppy, J.},

year={2012}

}

Download (PDF)

View

Source

Source codes

Package:

NDP2GPU

2151

views

Graphics processing units (GPUs) provide both memory bandwidth and arithmetic performance far greater than that available on CPUs, but, because of their Single-Instruction-Multiple-Data (SIMD) architecture, they are hard to program. Most of the programs ported to GPUs thus far use traditional data-level parallelism, performing only operations that operate uniformly over vectors. Porting algorithms that do not easily fit this model requires laborious manual program conversion to run programs with irregular control-flow or data layout. Language support for programming GPUs is rather limited. The low-level GPU languages, such as CUDA and OpenCL, appear general-purpose, but must be used in restricted styles to get adequate performance. Some effort has been made to support computation on GPUs from high-level languages, such as C++, X10, and Haskell, but these mechanisms are limited to computations over flat vectors. NESL is a first-order functional language that was designed by Guy Blelloch to allow programmers to write irregular-parallel programs – such as parallel divide-and-conquer algorithms – for wide-vector parallel computers. NESL supports nested data parallelism (NDP), where the elements of a data-parallel computation can themselves be data-parallel computations, and it uses a novel compilation technique, called flattening, to map nested computations onto a SIMD execution model. This work demonstrated that NDP can be effectively used for a wide range of irregular-parallel computations. Wide-vector machines pose many of the same programming challenges as GPUs, so it is worth exploring the question of whether an NDP language, like NESL, can make an effective tool for programming GPUs. This paper presents the results of such an exploration. We describe our port of the NESL implementation to work on GPUs and present empirical evidence that NDP on GPUs significantly outperforms CPU-based implementations and matches or beats newer GPU languages that support only flat parallelism. We also compare our performance with hand-tuned CUDA programs. While our performance does not match that of handtuned codes, we argue that the notational conciseness of NESL is worth the loss in performance. This work provides the first language implementation that directly supports NDP on a GPU.

Tags: Algorithms, Compilers, Computer science, CUDA, Data parallelism, nVidia, Package, Programming Languages, Tesla C2050

March 31, 2012 by hgpu

Rating: 1.5/5. From 2 votes.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org

Nested Data-Parallelism on the GPU

Package:

Your response

Recent source codes

Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR

Pinocchio: PINpointing Orbit Crossing Collapsed Hierarchical Objects

KernelCoder: trained on a curated dataset of reasoning traces and CUDA kernel pairs

VibeCodeHPC - Multi Agentic Vibe Coding for HPC

Compile-Time Resource Safety for GPU APIs: A Low-Overhead Typestate Framework

exa-AMD: Exascale Accelerated Materials Discovery

TRUST: a thermalhydraulic software package for CFD simulations

Modular: The Modular Platform (includes MAX & Mojo)

Allo: Accelerator Design Language

Towards Robust Agentic CUDA Kernel Benchmarking, Verification, and Optimization

Most viewed papers (last 30 days)

Nested Data-Parallelism on the GPU

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)