high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Fine-Grained Resource Sharing for Concurrent GPGPU Kernels

Fine-Grained Resource Sharing for Concurrent GPGPU Kernels

Chris Gregg, Jonathan Dorn, Kim Hazelwood, Kevin Skadron

Department of Computer Science, University of Virginia, PO Box 400740

4th USENIX Workshop on Hot Topics in Parallelism (HotPar’12), 2012

@article{gregg2012fine,

title={Fine-Grained Resource Sharing for Concurrent GPGPU Kernels},

author={Gregg, Chris and Dorn, Jonathan and Hazelwood, Kim and Skadron, Kevin},

year={2012}

}

Download (PDF)

View

Source

2221

views

General purpose GPU (GPGPU) programming frameworks such as OpenCL and CUDA allow running individual computation kernels sequentially on a device. However, in some cases it is possible to utilize device resources more efficiently by running kernels concurrently. This raises questions about load balancing and resource allocation that have not previously warranted investigation. For example, what kernel characteristics impact the optimal partitioning of resources among concurrently executing kernels? Current frameworks do not provide the ability to easily run kernels concurrently with fine-grained and dynamic control over resource partitioning. We present KernelMerge, a kernel scheduler that runs two OpenCL kernels concurrently on one device. KernelMerge furnishes a number of settings that can be used to survey concurrent or single kernel configurations, and to investigate how kernels interact and influence each other, or themselves. KernelMerge provides a concurrent kernel scheduler compatible with the OpenCL API. We present an argument on the benefits of running kernels concurrently. We demonstrate how to use KernelMerge to increase throughput for two kernels that efficiently use device resources when run concurrently, and we establish that some kernels show worse performance when running concurrently. We also outline a method for using KernelMerge to investigate how concurrent kernels influence each other, with the goal of predicting runtimes for concurrent execution from individual kernel runtimes. Finally, we suggest GPU architectural changes that would improve such concurrent schedulers in the future.

Tags: Computer science, nVidia, nVidia GeForce GTX 460, OpenCL, Programming techniques, Task scheduling

May 24, 2012 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Fine-Grained Resource Sharing for Concurrent GPGPU Kernels

Your response

Recent source codes

Kernel Library for LLM Serving

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Genten: Software for Generalized Tensor Decompositions by Sandia National Laboratories

Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR

Pinocchio: PINpointing Orbit Crossing Collapsed Hierarchical Objects

KernelCoder: trained on a curated dataset of reasoning traces and CUDA kernel pairs

VibeCodeHPC - Multi Agentic Vibe Coding for HPC

Compile-Time Resource Safety for GPU APIs: A Low-Overhead Typestate Framework

exa-AMD: Exascale Accelerated Materials Discovery

Most viewed papers (last 30 days)

Fine-Grained Resource Sharing for Concurrent GPGPU Kernels

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)