high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » System Design Principles for Heterogeneous Resource Management and Scheduling in Accelerator-Based Systems

System Design Principles for Heterogeneous Resource Management and Scheduling in Accelerator-Based Systems

Dipanjan Sengupta

Georgia Institute of Technology

Georgia Institute of Technology, 2016

@article{sengupta2016system,

title={System Design Principles for Heterogeneous Resource Management and Scheduling in Accelerator-Based Systems},

author={Sengupta, Dipanjan},

year={2016},

publisher={Georgia Institute of Technology}

}

Download (PDF)

View

Source

2449

views

Accelerator-based systems are making rapid inroads into becoming platforms of choice for both high end cloud services and processing irregular applications like real-world graph analytics due to their high scalability and low dollar to FLOPS ratios. Yet GPUs are not first class schedulable entities causing substantial hardware resource underutilization, including their computational and data movement engines. Therefore, software solutions with support for efficient resource management principles are required to address such scheduling crisis in GPUs. Further, two important characteristics of real world graphs like those in social networks are that they are big and are constantly evolving over time. This poses challenge due to limitations in GPU-resident memory for storing these large graphs. And because of the high rate at which these large-scale graphs evolve, it is undesirable and computationally infeasible to repeatedly run static graph analytics on a sequence of versions or snapshots, of the evolving graph. Therefore, novel incremental solutions are required to process large-scale evolving graphs in near real-time using GPUs with memory footprint exceeding the device’s internal memory capacity. First, the thesis presents Strings, a GPU scheduling infrastructure that achieves high system throughput and fairness among applications from multiple tenants using manycore GPU servers by treating GPUs as first class schedulable entities, and decomposing the scheduling problem into a novel combination of load balancing and per-device resource sharing. Second, for processing graph applications with larger memory footprint than the device memory the thesis presents GraphReduce, a highly efficient and scalable GPU-based framework that adopts a combination of edge- and vertex-centric implementations of the Gather-Apply-Scatter programming model and operates on multiple asynchronous GPU streams to fully exploit the high degrees of parallelism in GPUs supporting efficient graph data movement between the host and device. Finally, to address the problem of analyzing evolving graphs in near real-time, we present EvoGraph, a high performance GPU-based dynamic graph analytics framework that incrementally processes graphs on-the-fly using fixed-sized batches of updates. To realize this vision we present a novel programming model that allows for implementing a large set of incremental graph algorithms seamlessly across multiple GPU cores. It also characterizes various graph algorithms and how related graph properties affect the complexity of incremental graph processing in making runtime decisions to choose between an incremental vs static run over a particular update batch to achieve the best performance.

Tags: Algorithms, Cloud, Computer science, CUDA, Heterogeneous systems, nVidia, nVidia Quadro 2000, nVidia Quadro 4000, Task scheduling, Tesla C2050, Tesla C2070, Tesla K20, Thesis

September 10, 2016 by hgpu

Rating: 1.7/5. From 3 votes.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

System Design Principles for Heterogeneous Resource Management and Scheduling in Accelerator-Based Systems

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

System Design Principles for Heterogeneous Resource Management and Scheduling in Accelerator-Based Systems

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)