Modeling and Optimization of Parallel Matrix-based Computations on GPU

Andrew White
Auburn University
Auburn University, 2013

   title={Modeling and Optimization of Parallel Matrix-based Computations on GPU},

   author={White, Andrew},


   school={Auburn University}


Download Download (PDF)   View View   Source Source   



As graphics processing units (GPUs) are continually being utilized as coprocessors, the demand for optimally utilizing them for various applications continues to grow. This work narrows the gap between programmers and minimum execution time for matrix-based computations on a GPU. To minimize execution time, computation and communication time must be considered. For computation, the placement of data in GPU memory significantly affects computation time and therefore is considered. Various matrix-based computation patterns are examined with respect to the layout of GPU memory. A computation pattern refers to the order in which each GPU thread computes a result. From examination of computation patterns, an optimized computation pattern, a pattern which reduces computation time, is derived. After the optimized computation pattern is derived, the access pattern to GPU memory, or order in which data is accessed, is considered. From the optimized access pattern, fine-tuning is performed to the GPU code such as minimization of index calculations and loop unrolling to further reduce computation time and resource allocation. After fine-tuning, input parameters which yield the minimum computation time for a matrix-based computation on the GPU are derived. Input parameters are defined as the dimensions of the grid and blocks assigned for execution on the GPU. Execution metrics are formulated, as functions of the input parameters, to represent the executional behavior of the GPU. The execution metrics are utilized to derive the optimal input parameters which are input parameters that yield the minimum computation time. The matrix-based computations considered are matrix-vector multiplication (Mv), matrix-matrix multiplication (MM), convolution, and the conjugate gradient method. A parallelization procedure is developed to minimize execution time by deriving the optimal placement of data, optimized computation pattern, optimized access pattern, and optimal input parameters. Each step of the procedure is developed through analysis of Mv and MM. The procedure includes an accurate communication model and estimates for CPU and GPU data transfers. With accurate estimates for communication and computation times, partitioning computation for CPU and GPU execution is performed to minimize execution time. Convolution and conjugate gradient are utilized to verify the validity of the procedure. Therefore, the overall goal of this work is to develop a parallelization procedure which minimizes execution time of matrix-based computations executing on the GPU.
VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)

* * *

* * *

Follow us on Twitter

HGPU group

1546 peoples are following HGPU @twitter

Like us on Facebook

HGPU group

275 people like HGPU on Facebook

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: nVidia CUDA Toolkit 6.5.14, AMD APP SDK 3.0
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.3
  • SDK: AMD APP SDK 3.0

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2015 hgpu.org

All rights belong to the respective authors

Contact us: