high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Predictable GPGPU Computing in DNN-Driven Autonomous Systems

Predictable GPGPU Computing in DNN-Driven Autonomous Systems

Husheng Zhou

The University of Texas at Dallas

The University of Texas at Dallas, 2018

@phdthesis{zhou2018predictable,

title={Predictable GPGPU Computing in DNN-Driven Autonomous Systems},

author={Zhou, Husheng},

year={2018}

}

Download (PDF)

View

Source

2348

views

Graphics processing units (GPUs) are being widely used as co-processors in many domains to accelerate general-purpose workloads that are data-parallel and computationally intensive, i.e., GPGPU. An emerging usage domain is adopting GPGPU to accelerate inherently computation-intensive Deep Neural Network (DNN) workloads in autonomous systems. Such autonomous systems are usually time-sensitive, especially for autonomous driving systems. When driving alongside human drivers, loss of life or property may result if the computing systems of the autonomous vehicles fail to respond to events before its deadline. Much research has been conducted to algorithmically optimize the accuracy and performance of deep neural networks, but limited attention has been given to optimizing the execution of GPU-accelerated DNN workloads from the scheduling angle, especially in a time-constrained multi-tasking environment. Adopting GPGPU to accelerate DNN workloads in time-sensitive autonomous systems that are often resource-constrained presents a series of challenges: (1) GPUs are designed to execute non-preemptively, which may cause priority inversion; (2) How to optimize the execution of GPUaccelerated DNN workloads at the system level in a real-time multi-tasking environment; (3) How to simultaneously achieve two (often) conflicting goals in a resource-constrained embedded CPUGPU heterogeneous platform: timing predictability and energy efficiency, that are essential for any DNN-based autonomous driving system. The goal of the research presented in this dissertation is to solve or remedy the aforementioned challenges. Specifically, we propose GPES, a runtime system that allows GPU executions to be interruptible and preemptable in a multi-tasking environment. We proposed S^3DNN, a systemic solution that optimizes the execution of DNN workloads on GPU in a soft real-time multi-tasking environment. We proposed PredJoule, a runtime system which presents a layer-based approach that controls the timing and optimizes energy efficiency by exploiting each layer’s performance/energy characteristics. In addition to the runtime systems we proposed, we investigate the problem of mapping multiple applications implemented using kernel graphs in a heterogeneous system, and present a theoretical framework that formulates this problem as an integer program and a set of practically efficient mapping algorithms. Furthermore we present a reuse-based approach to further improve the predictability of GPU computing.

Tags: Computer science, CUDA, Deep learning, Heterogeneous systems, Neural networks, nVidia, nVidia GeForce GTX 480, nVidia GeForce GTX 620, nVidia GeForce GTX 660, nVidia Jetson TX2, nVidia Quadro 6000, Tesla K80, Thesis

May 12, 2019 by hgpu

Rating: 2.0/5. From 1 vote.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Predictable GPGPU Computing in DNN-Driven Autonomous Systems

Your response

Recent source codes

ParaCodex: A Profiling-Guided Autonomous Coding Agent for Reliable Parallel Code Generation and Translation

SeedFold: Scaling Biomolecular Structure Prediction

Tilus: A Tile-Level GPU Kernel Programming Language

Memory-Efficient Acceleration of Block Low-Rank Foundation Models on Resource Constrained GPUs

CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning

BoltzGen:Toward Universal Binder Design

cuPilot: A Strategy-Coordinated Multi-agent Framework for CUDA Kernel Evolution

MATLAB Tensor Core models

TritonForge: Transform PyTorch Operations into Optimized GPU Kernels with LLMs

RLTune: Hybrid Learning and Optimization-Based Dynamic Scheduling for DL Workloads on Heterogeneous GPU Clusters

Most viewed papers (last 30 days)

Predictable GPGPU Computing in DNN-Driven Autonomous Systems

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)