high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » A Light-weight API for Portable Multicore Programming

A Light-weight API for Portable Multicore Programming

Christopher G. Baker, Michael A. Heroux, H. Carter Edwards, Alan B. Williams

Scalable Algorithms Department, Sandia National Laboratories, P.O. Box 5800, MS 1320, Albuquerque, NM 87185-1320, USA

18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), 2010

DOI:10.1109/PDP.2010.49

@conference{baker2010light,

title={A light-weight API for portable multicore programming},

author={Baker, C.G. and Heroux, M.A. and Edwards, H.C. and Williams, A.B.},

booktitle={2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing},

pages={601–606},

issn={1066-6192},

year={2010},

organization={IEEE}

}

Download (PDF)

View

Source

1495

views

Multicore nodes have become ubiquitous in just a few years. At the same time, writing portable parallel software for multicore nodes is extremely challenging. Widely available programming models such as OpenMP and Pthreads are not useful for devices such as graphics cards, and more flexible programming models such as RapidMind are only available commercially. OpenCL represents the first truly portable standard, but its availability is limited. In the presence of such transition, we have developed a minimal application programming interface (API) for multicore nodes that allows us to write portable parallel linear algebra software that can use any of the aforementioned programming models and any future standard models. We utilize C++ template meta-programming to enable users to write parallel kernels that can be executed on a variety of node types, including Cell, GPUs and multicore CPUs. The support for a parallel node is provided by implementing a Node object, according to the requirements specified by the API. This ability to provide custom support for particular node types gives developers a level of control not allowed by the current slate of proprietary parallel programming APIs. We demonstrate implementations of the API for a simple vector dot-product on sequential CPU, multicore CPU and GPU nodes.

Tags: Computer science, CUDA, Linear Algebra, nVidia, nVidia GeForce GTX 280, OpenCL, RapidMind

April 3, 2011 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

A Light-weight API for Portable Multicore Programming

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

A Light-weight API for Portable Multicore Programming

Share this:

Recent source codes

Most viewed papers (last 30 days)