Reuse and Refactoring of GPU Kernels to Design Complex Applications

hgpu.org » Applications » Computer science » Reuse and Refactoring of GPU Kernels to Design Complex Applications

Reuse and Refactoring of GPU Kernels to Design Complex Applications

Santonu Sarkar, Sayantan Mitra, Ashok Srinivasan

Infosys Labs, Infosys Ltd. Bangalore 560100, India

Infosys Labs, TR-120131, 2012

@article{sarkar2012reuse,

title={Reuse and Refactoring of GPU Kernels to Design Complex Applications},

author={Sarkar, S. and Mitra, S. and Srinivasan, A.},

year={2012}

}

Download (PDF)

View

Source

1632

views

Developers of GPU kernels, such as FFT, linear solvers, etc, tune their code extensively in order to obtain optimal performance, making efficient use of different resources available on the GPU. Complex applications are composed of several such kernel components. The software engineering community has performed extensive research on componentbased design to build generic and flexible components, such that the component can be reused across diverse applications, rather than optimizing its performance. Since a GPU is used primarily to improve performance, application performance becomes a key design issue. The contribution of our work lies in extending component based design research in a new direction, dealing with the performance impact of refactoring an application consisting of the composition of highly tuned kernels. Such refactoring can make the composition more effective with respect to GPU resource usage especially when combined with suitable scheduling. Here we propose a methodology where developers of highly tuned kernels can enable application designers to optimize performance of the composition. Kernel developers characterize the performance of a kernel through its "performance signature". The application designer combines these kernels such that such that the performance of the refactored kernel is better than the sum of the performances of the individual kernels.This is partly based on the observation that different kernels may make unbalanced use of different GPU resources like different types of memory. Kernels may also have the potential to share data. Refactoring the kernels, combining them, and scheduling them suitably can improve performance. We study different types of potential design optimizations and evaluate their effectiveness on different types of kernels. This may even involve choosing non-optimal parameters for an individual kernel. We analyze how the performance signature of the composition changes from that of the individual kernels through our techniques. We demonstrate that our techniques lead to over 50% improvement with some kernels. Furthermore, the performance of a basic molecular dynamics application can be improved by around 25.7%, on a Fermi GPU, compared with an un-refactored implementation.

Tags: Computer science, CUDA, FFT, Molecular dynamics, nVidia, nVidia GeForce GTX 480, Optimization, Software Engineering, Tesla S1050

February 24, 2012 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org