high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Improving the Programmability of GPU Architectures

Improving the Programmability of GPU Architectures

Cedric Nugteren

Technische Universiteit Eindhoven, NVIDIA

Cedric Nugteren, 2014

BibTeX

Download (PDF)

View

Source

Source codes

Package:

Bones

2366

views

Throughout the past decades, the tremendous growth of single-core performance has been the key-enabler for digital technology to become ubiquitous in our society. Recently, diminishing returns on Dennard scaling resulted in power dissipation issues, leading to reduced performance growth. Performance growth has since been re-enabled by multi-core processors as well as by exploiting the energy efficiency of specialised accelerators such as graphics processing units (GPUs). This has led to a heterogeneous and parallel computing environment, making programming a challenging task. Programmers are faced with a variety of new languages and are required to deal with the architecture’s parallelism and memory hierarchy. This has become increasingly important, especially considering the memory wall and the prospect of dark silicon. Apart from programming, issues such as code maintainability and portability have become of major importance. To address these issues, this thesis first introduces algorithmic species: a classification of program code based on memory access patterns. Algorithmic species is a structured classification that programmers and compilers can use for example to take parallelisation decisions or to perform memory access optimisations. The algorithmic species classification is used in a skeleton-based compiler to automatically generate efficient and readable code for GPUs and other parallel processors. To do so, C code is first automatically annotated with species information. The annotated code is subsequently fed into bones, a source-to-source compiler that provides pre-optimised code templates ("skeletons") for specific algorithmic species. By applying traditional and species-based optimisations such as thread coarsening and kernel fusion on top of this, bones is able to generate competitive code. Combining skeletons with a program code classification (the species) creates a unique code generation approach, integrating a skeleton-based compiler into an automated compilation flow for the first time. Furthermore, this thesis proposes to change the GPU’s thread scheduling mechanism to improve its programmability. Programming models for GPUs allow programmers to specify the independence of threads, removing ordering constraints. Still, GPUs do not exploit the potential for locality (e.g. improving cache performance) enabled by this independence: threads are scheduled in a fixed order. This thesis quantifies the potential of scheduling in a "locality-aware" manner. A detailed reuse-distance based cache model for GPUs is introduced to provide better insight into locality and cache behaviour.

Tags: Algorithms, Computer science, CUDA, nVidia, OpenACC, OpenCL, Package, Performance, Thesis

May 7, 2014 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Improving the Programmability of GPU Architectures

Package:

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Most viewed papers (last 30 days)

Improving the Programmability of GPU Architectures

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)