high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Increasing GPU Throughput using Kernel Interleaved Thread Block Scheduling

Increasing GPU Throughput using Kernel Interleaved Thread Block Scheduling

Mihir Awatramani, Joseph Zambreno, Diane Rover

Department of Electrical and Computer Engineering, Iowa State University, Ames, Iowa, USA

International Conference on Computer Design (ICCD), 2013

@conference{MihZam13A,

title={Increasing GPU Throughput using Kernel Interleaved Thread Block Scheduling},

booktitle={Proceedings of the International Conference on Computer Design (ICCD)},

year={2013},

month={October},

author={Mihir Awatramani and Joseph Zambreno and Diane Rover}

}

Download (PDF)

View

Source

2627

views

The number of active threads required to achieve peak application throughput on graphics processing units (GPUs) depends largely on the ratio of time spent on computation to the time spent accessing data from memory. While compute-intensive applications can achieve peak throughput with a low number of threads, memory-intensive applications might not achieve good throughput even at the maximum supported thread count. In this paper, we study the effects of scheduling work from multiple applications on the same GPU core. We claim that interleaving workload from different applications on a GPU core can improve the utilization of computational units and reduce the load on memory subsystem. Experiments on 17 application pairs from the Rodinia benchmark suite show that overall throughput increases by 7% on average.

Tags: Computer science, CUDA, GPGPU-sim, nVidia, Performance

September 13, 2013 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Increasing GPU Throughput using Kernel Interleaved Thread Block Scheduling

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)

Increasing GPU Throughput using Kernel Interleaved Thread Block Scheduling

Share this:

Recent source codes

Most viewed papers (last 30 days)