high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Towards Improving Programmability of Heterogeneous Parallel Architectures

Towards Improving Programmability of Heterogeneous Parallel Architectures

Michele Scandale

Politecnico di Milano, Department of Electronics, Information, and Bioengineering

Politecnico di Milano, 2015

@phdthesis{scandale2015towards,

title={Towards Improving Programmability of Heterogeneous Parallel Architectures},

author={Scandale, Michele},

year={2015},

school={Politecnico di Milano}

}

Download (PDF)

View

Source

1942

views

Parallel computing has been considered an effective approach to combine performance and power efficiency for a long time. Starting from High Performance Computing (HPC) to modern embedded systems, the employment of heterogeneous parallel architectures is becoming the common case, since they provide a good tradeoff in terms of power efficiency. The exascale objective for the next generation of HPC systems is constrained to a target power envelope ranging from 20 MW to 30 MW. The existing "green" HPC systems are not yet able to reach the such power efficiency although they already employ modern heterogeneous parallel architectures. Ultra-low-power hardware platforms are gaining an increasing traction, as they may represent the key component to allow future HPC systems to match the required power efficiency. The programmability of such systems is a critical aspect that has an huge impact on the reachable power efficiency and the effort required to reach such target. Programming parallel architectures is a complex task, since many hardware features are directly exposed to the programmers. Programming frameworks that try to hide such complexity exist, however they either provide only sub-optimal performance with respect to hand tuned implementations, or they are limited to specific application domains. This dissertation tackles challenges related to the programmability of heterogenous parallel architectures, acting on both existing and future programming models and hardware architectures. In particular, we present OpenCRun, an OpenCL runtime implementation supporting a range of platforms with very different architectures characteristics, such as X86 multicores and embedded parallel accelerators. In the context of ultra-low-power architectures we report the joint effort between hardware and software developers towards the PULP platform, showing the benefits of selected ISA extensions and their compiler support to maximize the power efficiency. Moreover, to improve functional and performance portability of OpenCL code between GPGPUs and embedded many-core accelerators with explicitly managed memory such as PULP and STHorm, we have proposed a code transformation technique, workitem coalescing, that bypasses the limitations of the embedded platforms, allowing code developed for GPGPU to be ported seamlessly, as well as a memory transfer optimization technique to tune the resulting code to improve performance. Finally, to increase the abstraction level in a more radical way, leveraging Shared Virtual Memory that is expected to be available in future architectures, we have presented a method to transparently implement shared function pointers in heterogeneous platforms with two or more ISAs, a building block for enabling full C++ support across heterogeneous ISAs. Indeed we presented a fallback solution to implement function calls from device side to functions not available on the device itself. This mechanism is needed to enable the transparent support of C++, and to provide more flexibility to the programmers dealing with large and complex applications to be ported towards heterogeneous parallel accelerators.

Tags: Compilers, Computer science, Heterogeneous systems, LLVM, OpenCL, Thesis

February 16, 2016 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Towards Improving Programmability of Heterogeneous Parallel Architectures

Recent source codes

Code examples for paper on SYCL backend of Kokkos - IWOCL 2024

ROCm's implementation of Gromacs

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Most viewed papers (last 30 days)

Towards Improving Programmability of Heterogeneous Parallel Architectures

Share this:

Recent source codes

Most viewed papers (last 30 days)