high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Abstractions for Programming Graphics Processors in High-Level Programming Languages

Abstractions for Programming Graphics Processors in High-Level Programming Languages

Tim Besard

Ghent University

Ghent University, 2019

BibTeX

Download (PDF)

View

Source

Source codes

Package:

CUDAnative.jl: Julia support for native CUDA programming

2230

views

Software development has long been based on hardware that grows exponentially faster, which has allowed application complexity to increase accordingly. This free lunch is over, however, and traditional CPUs (Central Processing Units) don’t double their performance every couple of years anymore. As a result, compute-intensive applications have increasingly been relying on hardware accelerators like GPUs (Graphics Processing Units) to satisfy their computational demands. At the same time, the demand for powerful and complex applications remains. Traditional programming languages, like C and C++, are illsuited to meet this demand since they require significant programmer expertise. Instead, high-level languages like Python and MATLAB have been gaining popularity as they allow the programmer to focus on application logic and not care about how that application will be executed. However, it is difficult to use high-level languages to program hardware accelerators like GPUs. First and foremost, the design of these languages often relies on techniques that are not compatible with external accelerators, such as interpretation and dynamic compilation. At the same time, hardware accelerators are mostly used to improve performance, while high-level languages focus on productivity even at the expense of performance. As a result, accelerators are typically programmed with low-level languages that guarantee good performance, in combination with a high-level language to implement the remaining application logic. Partitioning an application into two or more languages, where one language is used to implement performance-sensitive code and another for the application logic, causes many problems. The use of multiple languages essentially introduces a barrier that prevents reuse, abstraction and optimization of code, but also complicates communication between programmers working on different parts of the codebase. In this dissertation, we present an alternative approach where a single high-level language is used to implement both application logic and GPU code. We do so while maintaining the productivity of the high-level language, without sacrificing the performance of code when it is executed on the hardware accelerator. We start from the existing Julia programming language, a high-level, general-purpose language that was specifically designed for efficient execution on CPUs. To execute Julia code on an accelerator like a GPU, we need to add a back end to its compiler. This part of the language implementation is responsible for lowering source code to executable machine code. Traditionally, alternative back ends are either added to and integrated with the existing compiler, or implemented as a wholly separate compiler specific to one particular accelerator. As part of this research, we define interfaces to the high-level compiler for implementing external back ends. With these interfaces, it is possible to develop a new back end without having to integrate with the existing compiler and, e.g., comply with its requirements in terms of code quality or licensing. At the same time, existing compiler functionality is reused, which greatly lowers the required effort to implement a new back end. By avoiding a separate compiler, code compatibility between individual back ends is also improved and inevitable differences between implementations are avoided. We have added these interfaces to the Julia programming languages, and contributed their implementation to the corresponding open-source project. The interfaces grant access to the different IRs (Intermediate Representations) as they exist throughout the compilation process, which includes IR code of the LLVM (Low Level Virtual Machine) library that the Julia compiler uses. To interact efficiently with this library, we have created the LLVM.jl package to interface with the LLVM API (Application Programming Interface) from Julia. We then demonstrate the potential of these interfaces by implementing a GPU back end for the Julia language. This back end, available as an open-source package under the name CUDAnative.jl, makes it possible to execute Julia code on CUDA (Compute Unified Device Architecture) GPUs. The performance of this high-level GPU code is comparable to equivalent low-level code written in CUDA C, which we demonstrate using the Rodinia benchmark suite for heterogeneous computing. However, high-level GPU programming in Julia is much more productive, less error prone, and requires no additional expertise in terms of other, low-level programming languages. As a high-level language, Julia has several language features that enable powerful abstractions. For example, automatic specialization of generic code can be used to build higher-order array abstractions. These abstractions can be used to separate abstract operations on arrays from their concrete implementation responsible for allocating memory, executing the operation, etc. This results in concise and readable code, and does not require the high-level programmer to know how the underlying data structure is implemented. Using the CUDAnative.jl GPU back end, we have conducted research into these array abstractions for GPUs without the typical barrier between high-level application code and low-level infrastructure. We have implemented this research in the CuArrays.jl package, and used it to demonstrate how array abstractions are powerful enough to implement realistic applications. Courtesy of the Julia’s higher-order array abstractions, these implementations are platform-agnostic. We illustrate this by executing the applications on a variety of platforms, including CPUs, GPUs with CuArrays.jl, and distributed clusters of CPUs and GPUs using DistributedArrays.jl. Finally, we show that array abstractions are also useful for algorithmic optimizations, such as automatic differentiation of GPU array abstractions. Differentiation is necessary to compute gradients as they occur in neural network and other ML (Machine Learning) applications. Typically, the programmer is restricted to specific operations for which the derivative has already been implemented. Using automatic differentiation, arbitrary code can be derived, but often at the expense of performance. By exploiting the structure of the broadcast array abstraction, we can efficiently differentiate operations even when they use dynamic control flow. Our implementation in Julia builds on CuArrays.jl, which makes it possible to use our technique on the GPU. The compiler interfaces and accompanying GPU back end as presented in this dissertation provide an important basis for research into high-level programming and abstractions for hardware accelerators using a general-purpose programming language. We have demonstrated this by improving the GPU programming experience in Julia at different levels of abstraction. These improvements are significant but incremental, and we expect future research to fully exploit the high-level features of the Julia language for the purpose of novel abstractions and new GPU programming models.

Tags: Computer science, CUDA, Heterogeneous systems, High-level Languages, LLVM, nVidia, nVidia GeForce GTX 1080, Package, Thesis

December 29, 2019 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Abstractions for Programming Graphics Processors in High-Level Programming Languages

Package:

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Abstractions for Programming Graphics Processors in High-Level Programming Languages

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)