high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Automatic GPU optimization through higher-order functions in functional languages

Automatic GPU optimization through higher-order functions in functional languages

John Wikman

KTH Royal Institute of Technology, School of Electrical Engineering and Computer Science

KTH Royal Institute of Technology, 2020

BibTeX

Download (PDF)

View

Source

1703

views

Over recent years, graphics processing units (GPUs) have become popular devices to use in procedures that exhibit data-parallelism. Due to high parallel capability, running procedures on a GPU can result in an execution time speedup ranging from a couple times faster to several orders of magnitude faster, compared to executing serially on a central processing unit (CPU). Interfaces such as CUDA and OpenCL flexibly exposes the parallel capabilities of the GPU to the programmer, while at the same time putting a lot of responsibility on the programmer to handle aspects such as thread synchronization and memory management. A different approach to GPU optimization is to enable it through higher-order functions with known data-parallelism, using the semantics of the higher-order function to determine the parallel execution. This approach has in practice been integrated into existing languages through libraries or been integrated directly into languages themselves. However, higher-order functions do not address when it is beneficial to execute on a GPU. Due to the GPU being a separate device, effects such as latency and memory transfer can cause a slowdown for small input values. In this thesis, a set of commonly used higher-order functions are GPU enabled as compiler intrinsics in a small functional language. These higher-order functions are also equipped with the option of automatically deciding at runtime if to execute on GPU or CPU. Results show that running higher-order functions on GPU yields a speedup for larger computations. However, the performance does not match existing solutions that provide additional higher-order functions for optimizing the parallelization. The selected approach for automatically deciding whether to run a higher-order function on GPU or on CPU results in the faster option a majority of cases. Though the most notable benefit of automatic decisions was for procedures that use multiple higher-order function invocations, which ran faster compared to when executing only on GPU or only on CPU.

Tags: Computer science, CUDA, nVidia, nVidia Quadro P 2000, Optimization, Thesis

November 15, 2020 by hgpu

No votes yet.

Please wait...

high performance computing on graphics processing units: hgpu.org

Automatic GPU optimization through higher-order functions in functional languages

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Automatic GPU optimization through higher-order functions in functional languages

Share this:

Recent source codes

Most viewed papers (last 30 days)