high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Trellis: Portability Across Architectures with a High-level Framework

Trellis: Portability Across Architectures with a High-level Framework

Lukasz G. Szafaryn, Todd Gamblin, Bronis R. de Supinski, Kevin Skadron

University of Virginia

Journal of Parallel and Distributed Computing, 2013

DOI:10.1016/j.jpdc.2013.07.001

@article{szafaryn2013trellis,

title={Trellis: Portability across architectures with a high-level framework},

author={Szafaryn, Lukasz G and Gamblin, Todd and De Supinski, Bronis R and Skadron, Kevin},

journal={Journal of Parallel and Distributed Computing},

year={2013},

publisher={Elsevier}

}

Download (PDF)

View

Source

1930

views

The increasing computational needs of parallel applications inevitably require portability across parallel architectures, which now include heterogeneous processing resources, such as CPUs and GPUs, and multiple SIMD/SIMT widths. However, the lack of a common parallel programming paradigm that provides predictable, near-optimal performance on each resource leads to the use of low-level frameworks with architecture-specific optimizations, which in turn cause the code base to diverge and makes porting difficult. Our experiences with parallel applications and frameworks lead us to the conclusion that achieving performance portability requires structured code, a common set of high-level directives and efficient mapping onto hardware. In order to demonstrate this concept, we develop Trellis, a prototype programming framework that allows the programmer to maintain only a single generic and structured codebase that executes efficiently on both the CPU and the GPU. Our approach annotates such code with a single set of high-level directives, derived from both OpenMP and OpenACC, that is made compatible for both architectures. Most importantly, motivated by the limitations of the OpenACC compiler in transforming such code into a GPU kernel, we introduce a thread synchronization directive and a set of transformation techniques that allow us to obtain the GPU code with the desired parallelization that yields more optimal performance. While a common high-level programming framework for both CPU and GPU is currently not available, our analysis shows that even obtaining the best-case performance with OpenACC, state-of-the-art solution for a GPU, requires modifications to the structure of codes to properly exploit braided parallelism, and cope with conditional statements or serial sections. While this already requires prior knowledge of compiler behavior the optimal performance is still unattainable due to the lack of synchronization. We describe the contributions of Trellis in addressing these problems by showing how it can achieve correct parallelization of the original codes for three parallel applications, with performance competitive to that of OpenMP and CUDA, improved programmability and reduced overall code length.

Tags: Computer science, CUDA, Heterogeneous systems, nVidia, OpenACC, Tesla C2050

September 27, 2013 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Trellis: Portability Across Architectures with a High-level Framework

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Trellis: Portability Across Architectures with a High-level Framework

Share this:

Recent source codes

Most viewed papers (last 30 days)