high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » A framework for dynamically instrumenting GPU compute applications within GPU Ocelot

A framework for dynamically instrumenting GPU compute applications within GPU Ocelot

Naila Farooqui, Andrew Kerr, Gregory Diamos, S. Yalamanchili, K. Schwan

Georgia Institute of Technology, Atlanta, GA

GPGPU-4 Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, 2011

DOI:10.1145/1964179.1964192

@inproceedings{farooqui2011framework,

title={A framework for dynamically instrumenting gpu compute applications within gpu ocelot},

author={Farooqui, N. and Kerr, A. and Diamos, G. and Yalamanchili, S. and Schwan, K.},

booktitle={Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units},

pages={9},

year={2011},

organization={ACM}

}

Download (PDF)

View

Source

Source codes

Package:

Ocelot

1475

views

In this paper we present the design and implementation of a dynamic instrumentation infrastructure for PTX programs that procedurally transforms kernels and manages related data structures. We show how performing instrumentation within the GPU Ocelot dynamic compiler infrastructure provides unique capabilities not available to other profiling and instrumentation toolchains for GPU computing. We demonstrate the utility of this instrumentation capability with three example scenarios – (1) performing workload characterization accelerated by a GPU, (2) providing load imbalance information for use by a resource allocator, and (3) providing compute utilization feedback to be used online by a simulated process scheduler that might be found in a hypervisor. Additionally, we measure both (1) the compilation overheads of performing dynamic compilation and (2) the increases in runtimes when executing instrumented kernels. On average, compilation overheads due to instrumentation consisted of 69% of the time needed to parse a kernel module, in the case of the Parboil benchmark suite. Slowdowns for instrumenting each basic block ranged from 1.5x to 5.5x, with the largest slowdowns attributed to kernels with large numbers of short, compute-bound blocks.

Tags: Computer science, CUDA, Heterogeneous systems, nVidia, nVidia GeForce GTX 480, OpenCL, Package, Programming Languages, PTX

August 19, 2011 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

A framework for dynamically instrumenting GPU compute applications within GPU Ocelot

Package:

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

A framework for dynamically instrumenting GPU compute applications within GPU Ocelot

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)