high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Syntix: A Profiling Based Resource Estimator for CUDA Kernels

Syntix: A Profiling Based Resource Estimator for CUDA Kernels

Maria Papadaki, Yannis Sfakianakis, Christos Kozanitis, Angelos Bilas

FORTH-ICS, N. Plastira 100, V. Vouton, Heraklion 70013, Greece

Procedia Computer Science 156, 3-12, 2019

DOI:10.1016/j.procs.2019.08.123

BibTeX

Download (PDF)

View

Source

1692

views

Trending applications such as AI and data analytics have mandated the use of GPUs in modern datacenters for performance reasons. Current practice dictates to dedicate GPUs to applications, which limits the amount of concurrent users to the available GPUs. That use of GPUs contradicts with the policy of datacenters to oversubscribe resources and accommodate as many user applications as possible. To address this issue, providers will inevitably resort to GPU sharing. In this work we introduce Syntix, a mechanism that we deploy on GPU sharing system and 1) profiles CUDA kernels in order to learn their resource requirements in terms of threads and blocks and 2) assigns those resources to kernels in order to be efficiently collocated into streams. Syntix is able to exploit the resources that are possibly wasted from the execution of an individual kernel and save the 80% of them on average.

Tags: Computer science, CUDA, GPU cluster, Grid, nVidia, nVidia Quadro K 2200

October 6, 2019 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Syntix: A Profiling Based Resource Estimator for CUDA Kernels

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

Syntix: A Profiling Based Resource Estimator for CUDA Kernels

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)