Evaluation and enhancement of memory efficiency targeting general-purpose computations on scalable data-parallel GPU architectures

hgpu.org » Programming » Algorithms » Evaluation and enhancement of memory efficiency targeting general-purpose computations on scalable data-parallel GPU architectures

Evaluation and enhancement of memory efficiency targeting general-purpose computations on scalable data-parallel GPU architectures

Byunghyun Jang

College of Engineering, Department of Electrical and Computer Engineering, Northeastern University

Northeastern University, 2011

@article{jang2011evaluation,

title={Evaluation and enhancement of memory efficiency targeting general-purpose computations on scalable data-parallel GPU architectures},

author={Jang, B.},

year={2011}

}

Download (PDF)

View

Source

2088

views

This thesis addresses the memory efficiency of general-purpose applications running on massively multi-threaded, data-parallel GPU architectures. Although scalable, data-parallel GPU architectures and their associated general-purpose programming models offer impressive computational capability and attractive power budgets, the pace of migrating general-purpose applications to this emerging class of architectures is significantly hindered by the efficiency of memory subsystem present on these platforms. Programmers are forced to optimize the memory behavior of their code if they are interested in reaping the full benefits of these high performance, data-parallel architectures. In this thesis, we present a comprehensive study of memory access behavior for data-parallel workloads targeting GPUs, and present an algorithmic methodology to address memory inefficiency issues. We establish a mathematical model to capture memory behavior that enables us optimize memory system performance. We present a comprehensive analysis of memory access patterns that fully incorporates the influence of thread mapping and explains the memory behavior of kernels running on GPU hardware—this modeling and analysis serves as a theoretical foundation throughout this thesis. We then show how this new model of memory system activity can be used to enhance the memory efficiency of kernels through a series of algorithmic memory efficiency enhancement techniques. The techniques explored in this thesis include: 1) vectorization via data transformations on vector-based GPU architectures, 2) appropriate memory space selection, and 3) search for an optimized thread mapping and work group size. To demonstrate the power of our proposed algorithmic methodology, we develop a tool that implements this proposed approach and tests it on a diverse class of general-purpose benchmark applications. The experiments are conducted using the industry standard heterogeneous programming language, OpenCL, on two mainstream GPU platforms available in the market.

Tags: Algorithms, ATI, ATI Radeon HD 5870, Benchmarking, Computer science, Heterogeneous systems, nVidia, nVidia GeForce GTX 285, OpenCL, Optimization, Thesis

January 18, 2012 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org