high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » KeSCo: Compiler-based Kernel Scheduling for Multi-task GPU Applications

KeSCo: Compiler-based Kernel Scheduling for Multi-task GPU Applications

Zejia Lin, Zewei Mo, Xuanteng Huang, Xianwei Zhang, Yutong Lu

Sun Yat-sen University, Guangzhou, China

The 41st IEEE International Conference on Computer Design (ICCD’23), 2023

@article{lin2023kesco,

title={KeSCo: Compiler-based Kernel Scheduling for Multi-task GPU Applications},

author={Lin, Zejia and Mo, Zewei and Huang, Xuanteng and Zhang, Xianwei and Lu, Yutong},

year={2023}

}

Download (PDF)

View

Source

1400

views

Nowadays, Graphics Processing Units (GPUs) dominate in a wide spectrum of computing realms and multi-task is increasingly applied in various complicated applications. To gain higher performance, multi-task programs require cumbersome programming efforts to take advantage of inter-kernel concurrency at source-code level. Although there exist works automatically scheduling kernels to enable inter-kernel concurrency, they all inevitably introduce new programming frameworks and some even bring significant performance downgrade compared to the expertise-based optimizations. To address this issue, we propose KeSCo, a compiler-based scheduler to expose kernel level concurrency in multi-task programs with trivial code modification. In compilation, KeSCo applies a strategy to schedule kernels in task queues, accounting for both load balance and synchronization cost. Also, KeSCo utilizes a customized algorithm designed for computational flow to remove redundant synchronizations. The design is further extended to support multiprocess scenario, where multiple GPU processes are sharing a single context. Evaluations on representative benchmarks show that the proposed approach gains a 1.28x average speedup for multi-task scenario (1.22x for multi-process). Even with lessened programming efforts, our proposed design outperforms two state-of-the-arts GrSched and Taskflow by 1.31x and 1.16x on average, respectively.

Tags: Benchmarking, Compilers, Computer science, CUDA, nVidia, nVidia A100, Task scheduling

December 24, 2023 by hgpu

Rating: 5.0/5. From 2 votes.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

KeSCo: Compiler-based Kernel Scheduling for Multi-task GPU Applications

Your response

Recent source codes

Kernel Library for LLM Serving

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Genten: Software for Generalized Tensor Decompositions by Sandia National Laboratories

Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR

Pinocchio: PINpointing Orbit Crossing Collapsed Hierarchical Objects

KernelCoder: trained on a curated dataset of reasoning traces and CUDA kernel pairs

VibeCodeHPC - Multi Agentic Vibe Coding for HPC

Compile-Time Resource Safety for GPU APIs: A Low-Overhead Typestate Framework

exa-AMD: Exascale Accelerated Materials Discovery

Most viewed papers (last 30 days)

KeSCo: Compiler-based Kernel Scheduling for Multi-task GPU Applications

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)