high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Increasing GPU Throughput using Kernel Interleaved Thread Block Scheduling

Increasing GPU Throughput using Kernel Interleaved Thread Block Scheduling

Mihir Awatramani, Joseph Zambreno, Diane Rover

Department of Electrical and Computer Engineering, Iowa State University, Ames, Iowa, USA

International Conference on Computer Design (ICCD), 2013

BibTeX

Download (PDF)

View

Source

2967

views

The number of active threads required to achieve peak application throughput on graphics processing units (GPUs) depends largely on the ratio of time spent on computation to the time spent accessing data from memory. While compute-intensive applications can achieve peak throughput with a low number of threads, memory-intensive applications might not achieve good throughput even at the maximum supported thread count. In this paper, we study the effects of scheduling work from multiple applications on the same GPU core. We claim that interleaving workload from different applications on a GPU core can improve the utilization of computational units and reduce the load on memory subsystem. Experiments on 17 application pairs from the Rodinia benchmark suite show that overall throughput increases by 7% on average.

Tags: Computer science, CUDA, GPGPU-sim, nVidia, Performance

September 13, 2013 by hgpu

No votes yet.

Please wait...

high performance computing on graphics processing units: hgpu.org

Increasing GPU Throughput using Kernel Interleaved Thread Block Scheduling

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Increasing GPU Throughput using Kernel Interleaved Thread Block Scheduling

Share this:

Recent source codes

Most viewed papers (last 30 days)