high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Compiler-Driven Performance on Heterogeneous Computing Platforms

Compiler-Driven Performance on Heterogeneous Computing Platforms

Artem Chikin

Department of Computing Science, University of Alberta

University of Alberta, 2019

BibTeX

Download (PDF)

View

Source

1972

views

Modern parallel programming languages such as OpenMP provide simple, portable programming models that support offloading of computation to various accelerator devices. Coupled with the increasing prevalence of heterogeneous computing platforms and the battle for supremacy in the co-processor space, gives rise to additional challenges placed on compiler/runtime vendors to handle the increasing complexity and diversity of shared-memory parallel platforms.To start, this thesis presents three kernel re-structuring ideas that focus on improving the execution of high-level parallel code in GPU devices. The first addresses programs that include multiple parallel blocks within a single region of GPU code. A proposed compiler transformation can split such regions into multiple regions, leading to the launching of multiple kernels, onefor each parallel region. Second, is a code transformation that sets up a pipeline of kernel execution and asynchronous data transfers. This transformation enables the overlap of communication and computation. The third idea is that the selection of a grid geometry for the execution of a parallelregion must balance the GPU occupancy with the potential saturation of the memory throughput in the GPU. Adding this additional parameter to the geometry selection heuristic can often yield better performance at lower occupancy levels.This thesis next describes the Iteration Point Difference Analysis — a new static-analysis framework that can be used to determine the memory coalescing characteristics of parallel loops that target GPU offloading and to ascertain safety and profitability of loop transformations with the goal of improvingtheir memory-access characteristics. GPU kernel execution time across the Polybench suite is improved by up to 25.5x on an Nvidia P100 with benchmark overall improvement of up to 3.2x. An opportunity detected in a SPEC ACCEL benchmark yields kernel speedup of 86.5x with a benchmark improvement of 3.4x, and a kernel speedup of 111.1x with a benchmark improvement of 2.3 on an Nvidia P100 and V100, respectively.The task of modelling performance takes on an ever increasing importance as systems must make automated decisions on the most suitable offloading target. The third contribution of this thesis motivates the need with a study of cross-architectural changes in profitability of kernel offloading to GPU versus host CPU execution, and presents a prototype design for a hybrid computing device selection framework.

Tags: Compilers, Computer science, CUDA, Heterogeneous systems, Hybrid computing, nVidia, Performance, Tesla P100, Tesla V100, Thesis

November 17, 2019 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Compiler-Driven Performance on Heterogeneous Computing Platforms

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

Compiler-Driven Performance on Heterogeneous Computing Platforms

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)