high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Novel Methodologies for Predictable CPU-To-GPU Command Offloading

Novel Methodologies for Predictable CPU-To-GPU Command Offloading

Roberto Cavicchioli, Nicola Capodieci, Marco Solieri, Marko Bertogna

Universita di Modena e Reggio Emilia, Italy

31st Euromicro Conference on Real-Time Systems (ECRTS 2019), 2019

DOI:10.4230/LIPIcs.ECRTS.2019.22

BibTeX

Download (PDF)

View

Source

Source codes

Package:

Artifact Evaluation for Novel methodologies for predictable CPU-to-GPU command offloading

1661

views

There is an increasing industrial and academic interest towards a more predictable characterization of real-time tasks on high-performance heterogeneous embedded platforms, where a host system offloads parallel workloads to an integrated accelerator, such as General Purpose-Graphic Processing Units (GP-GPUs). In this paper, we analyze an important aspect that has not yet been considered in the real-time literature, and that may significantly affect real-time performance if not properly treated, i.e., the time spent by the CPU for submitting GP-GPU operations. We will show that the impact of CPU-to-GPU kernel submissions may be indeed relevant for typical real-time workloads, and that it should be properly factored in when deriving an integrated schedulability analysis for the considered platforms. This is the case when an application is composed of many small and consecutive GPU compute/copy operations. While existing techniques mitigate this issue by batching kernel calls into a reduced number of persistent kernel invocations, in this work we present and evaluate three other approaches that are made possible by recently released versions of the NVIDIA CUDA GP-GPU API, and by Vulkan, a novel open standard GPU API that allows an improved control of GPU command submissions. We will show that this added control may significantly improve the application performance and predictability due to a substantial reduction in CPU-to-GPU driver interactions, making Vulkan an interesting candidate for becoming the state-of-the-art API for heterogeneous Real-Time systems. Our findings are evaluated on a latest generation NVIDIA Jetson AGX Xavier embedded board, executing typical workloads involving Deep Neural Networks of parameterized complexity.

Tags: Computer science, CUDA, Heterogeneous systems, Neural networks, nVidia, Package, Vulkan

July 7, 2019 by hgpu

Rating: 3.5/5. From 2 votes.

Please wait...

high performance computing on graphics processing units: hgpu.org

Novel Methodologies for Predictable CPU-To-GPU Command Offloading

Package:

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Novel Methodologies for Predictable CPU-To-GPU Command Offloading

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)