Dynamic Instrumentation and Optimization for GPU Applications

hgpu.org » Applications » Computer science » Dynamic Instrumentation and Optimization for GPU Applications

Dynamic Instrumentation and Optimization for GPU Applications

Naila Farooqui, Christopher J. Rossbach, Yuan Yu

Georgia Institute of Technology

The 4th Workshop on Systems for Future Multicore Architectures (SFMA’14), 2014

BibTeX

Download (PDF)

View

Source

2235

views

Parallel architectures like GPUs are a tantalizing compute fabric for performance-hungry developers. While GPUs enable order-of-magnitude performance increases in many data-parallel application domains, writing efficient codes that can actually manifest those increases is a non-trivial endeavor, typically requiring developers to exercise specialized architectural features exposed directly in the programming model. Achieving good performance on GPUs involves effort-intensive tuning, typically requiring the programmer to manually evaluate multiple code versions in search of an optimal combination of problem decomposition with architecture- and runtime-specific parameters. For developers struggling to apply GPUs to more general-purpose computing problems, the introduction of irregular data structures and access patterns serves only to exacerbate these challenges, and only increases the level of effort required. This paper proposes to automate much of this effort using dynamic instrumentation to inform dynamic, profile-driven optimizations. In this vision, the programmer expresses the application using higher-level front-end programming abstractions such as Dandelion [13], allowing the system, rather than the programmer, to explore the implementation and optimization space. We argue that such a system is both feasible and urgently needed. We present the design for such a framework, called Leo. For a range of benchmarks, we demonstrate that a system implementing our design can achieve from 1.12 to 27x speedup in kernel runtimes, which translates to 9-40% improvement for end-to-end performance.

Tags: Computer science, CUDA, nVidia, Performance, Programming techniques, Tesla M2075

April 16, 2014 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org