Algorithmic GPGPU Memory Optimization

hgpu.org » Programming » Algorithms » Algorithmic GPGPU Memory Optimization

Algorithmic GPGPU Memory Optimization

Byunghyun Jang, Minsu Choi, Kyung Ki Kim

Heterogeneous Systems Research (HEROES) Laboratory, The University of Mississippi, University MS 38677 USA

International SoC design conference (ISOCC), 2013

BibTeX

Download (PDF)

View

Source

2755

views

The performance of General-Purpose computation on Graphics Processing Units (GPGPU) is heavily dependent on the memory access behavior. This sensitivity is due to a combination of the underlying Massively Parallel Processing (MPP) execution model present on GPUs and the lack of architectural support to handle irregular memory access patterns. Application performance can be significantly improved by applying memory-access-pattern-aware optimizations that can exploit knowledge of the characteristics of each access pattern. In this paper, we present an algorithmic methodology to semi-automatically find the best mapping of memory accesses present in serial loop nest to underlying data-parallel architectures based on a comprehensive static memory access pattern analysis. To that end we present a simple, yet powerful, mathematical model that captures all memory access pattern information present in serial data-parallel loop nests. We then show how this model is used in practice to select the most appropriate memory space for data and to search for an appropriate thread mapping and work group size from a large design space. To evaluate the effectiveness of our methodology, we have created a tool that incorporates our proposed algorithmic optimizations and report on execution speedup using selected benchmark kernels that cover a wide range of memory access patterns commonly found in GPGPU workloads. Our experimental results are reported using the industry standard heterogeneous programming language, OpenCL, targeting the NVIDIA GT200 architecture.

Tags: Algorithms, Computer science, Heterogeneous systems, Memory model, nVidia, nVidia GeForce GTX 285, OpenCL, Performance

September 15, 2013 by hgpu

No votes yet.

Please wait...

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org