high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Model-driven optimisation of memory hierarchy and multithreading on GPUs

Model-driven optimisation of memory hierarchy and multithreading on GPUs

Andrew A. Haigh, Eric C. McCreath

Research School of Computer Science, The Australian National University, Canberra, Australia

13th Australasian Symposium on Parallel and Distributed Computing (AusPDC 2015), 2015

@article{haigh2015model,

title={Model-driven optimisation of memory hierarchy and multithreading on GPUs},

author={Haigh, Andrew A and McCreath, Eric C},

year={2015}

}

Download (PDF)

View

Source

1499

views

Due to their potentially high peak performance and energy efficiency, GPUs are increasingly popular for scientific computations. However, the complexity of the architecture makes it difficult to write code that achieves high performance. Two of the most important factors in achieving high performance are the usage of the GPU memory hierarchy and the way in which work is mapped to threads and blocks. The dominant frameworks for GPU computing, CUDA and OpenCL, leave these decisions largely to the programmer. In this work, we address this in part by proposing a technique that simultaneously manages use of the GPU low-latency shared memory and chooses the granularity with which to divide the work (block size). We show that a relatively simple heuristic based on an abstraction of the GPU architecture is able to make these decisions and achieve average performance within 17% of an optimal configuration on an NVIDIA Tesla K20.

Tags: Computer science, CUDA, Memory, nVidia, nVidia GeForce GTX 580, Tesla K20

March 2, 2015 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Model-driven optimisation of memory hierarchy and multithreading on GPUs

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)

Model-driven optimisation of memory hierarchy and multithreading on GPUs

Share this:

Recent source codes

Most viewed papers (last 30 days)