high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Accelerating Haskell Array Codes with Algorithmic Skeletons on GPUs

Accelerating Haskell Array Codes with Algorithmic Skeletons on GPUs

Sean Youngsung Lee

School of Computer Science and Engineering, The University of New South Wales

The University of New South Wales, 2011

BibTeX

Download (PDF)

View

Source

2371

views

GPUs have been gaining popularity as general purpose parallel processors that deliver a performance to cost ratio superior to that of CPUs. However, programming on GPUs has remained a specialised area, as it often requires significant knowledge about the GPU architecture and platform-specific parallelisation of the algorithms that are implemented. Furthermore, the dominant programming models on GPUs limit functional decomposition of programs, as they require programmers to write separate functions to run on GPUs. I present and quantitatively evaluate a GPU programming system that provides a high-level abstraction to facilitate the use of GPUs for general purpose array processing. The presented programming system liberates programmers of low-level details and enables functional decomposition of programs, whilst providing the facilities to sufficiently exploit the parallel processing capability of GPUs. Fundamentally, the presented programming system allows programmers to focus on what to program on GPUs instead of how to program GPUs. The approach is based on algorithmic skeletons mapped to higher-order functions, which are commonly available in functional programming languages. The presented programming system (1) encapsulates the low-level control of GPUs and the GPU-specific parallelisation of the higher-order functions in algorithmic skeletons, (2) employs an embedded domain specific language as the interface that treats the parallel operations as expressions with parallel semantics, allowing functional decomposition of programs, and (3) manages the compilation of the embedded domain specific language into algorithmic skeleton instances and the execution of the algorithmic skeleton instances online. Programmers only need to write the operations to be executed on GPUs as expressions in the embedded domain specific language, and the presented programming system takes care of the rest. Although the concrete implementation presented in the thesis involves an embedded domain specific language in Haskell, Accelerate, and a specific GPU platform, CUDA, the approach in this thesis can be applied to other similar configurations with appropriate adjustments.

Tags: Algorithms, Computer science, CUDA, nVidia, Tesla S1070, Thesis

December 18, 2012 by hgpu

No votes yet.

Please wait...

high performance computing on graphics processing units: hgpu.org

Accelerating Haskell Array Codes with Algorithmic Skeletons on GPUs

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Accelerating Haskell Array Codes with Algorithmic Skeletons on GPUs

Share this:

Recent source codes

Most viewed papers (last 30 days)