high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Accelerating Haskell array codes with multicore GPUs

Accelerating Haskell array codes with multicore GPUs

Manuel M.T. Chakravarty, Gabriele Keller, Sean Lee, Trevor L. McDonell, Vinod Grover

University of New South Wales, Sydney, Australia

Proceedings of the sixth workshop on Declarative aspects of multicore programming, DAMP ’11, 2011

DOI:10.1145/1926354.1926358

BibTeX

Download (PDF)

View

Source

2183

views

Current GPUs are massively parallel multicore processors optimised for workloads with a large degree of SIMD parallelism. Good performance requires highly idiomatic programs, whose development is work intensive and requires expert knowledge. To raise the level of abstraction, we propose a domain-specific high-level language of array computations that captures appropriate idioms in the form of collective array operations. We embed this purely functional array language in Haskell with an online code generator for NVIDIA’s CUDA GPGPU programming environment. We regard the embedded language’s collective array operations as algorithmic skeletons; our code generator instantiates CUDA implementations of those skeletons to execute embedded array programs. This paper outlines our embedding in Haskell, details the design and implementation of the dynamic code generator, and reports on initial benchmark results. These results suggest that we can compete with moderately optimised native CUDA code, while enabling much simpler source programs.

Tags: Code generation, Computer science, CUBLAS, CUDA, Data parallelism, nVidia, Performance, Programming Languages, Tesla T10

August 22, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Accelerating Haskell array codes with multicore GPUs

Your response

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)

Accelerating Haskell array codes with multicore GPUs

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)