high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Runtime Specialization for Heterogeneous CPU-GPU Platforms

Runtime Specialization for Heterogeneous CPU-GPU Platforms

Naila Farooqui

Georgia Institute of Technology

Georgia Institute of Technology, 2016

@article{farooqui2015runtime,

title={Runtime Specialization for Heterogeneous CPU-GPU Platforms},

author={Farooqui, Naila},

year={2015},

publisher={Georgia Institute of Technology}

}

Download (PDF)

View

Source

2889

views

Heterogeneous parallel architectures like those comprised of CPUs and GPUs are a tantalizing compute fabric for performance-hungry developers. While these platforms enable order-of-magnitude performance increases for many data-parallel application domains, there remain several open challenges: (i) the distinct execution models inherent in the heterogeneous devices present on such platforms drives the need to dynamically match workload characteristics to the underlying resources, (ii) the complex architecture and programming models of such systems require substantial application knowledge and effort-intensive program tuning to achieve high performance, and (iii) as such platforms become prevalent, there is a need to extend their utility from running known regular data-parallel applications to the broader set of input-dependent, irregular applications common in enterprise settings. The key contribution of our research is to enable runtime specialization on such hybrid CPU-GPU platforms by matching application characteristics to the underlying heterogeneous resources for both regular and irregular workloads. Our approach enables profile-driven resource management and optimizations for such platforms, providing high application performance and system throughput. Towards this end, this research: (a) enables dynamic instrumentation for GPU-based parallel architectures, specifically targeting the complex Single-Instruction Multiple-Data (SIMD) execution model, to gain real-time introspection into application behavior; (b) leverages such dynamic performance data to support novel online resource management methods that improve application performance and system throughput, particularly for irregular, input-dependent applications; (c) automates some of the programmer effort required to exercise specialized architectural features of such platforms via instrumentation-driven dynamic code optimizations; and (d) proposes a specialized, affinity-aware work-stealing scheduling runtime for integrated CPU-GPU processors that efficiently distributes work across all CPU and GPU cores for improved load balance, taking into account both application characteristics and architectural differences of the underlying devices.

Tags: ATI, ATI Radeon HD 7770, Computer science, CUDA, Heterogeneous systems, nVidia, nVidia GeForce GTX 480, OpenCL, PTX, Tesla C2050, Tesla M2075, Thesis

June 9, 2016 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org

Runtime Specialization for Heterogeneous CPU-GPU Platforms

Your response

Recent source codes

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

CuTile Benchmark Suite: Performance and Productivity Tradeoffs for GPU Kernel Programming on Blackwell Architecture

Agentic Code Optimization via Compiler-LLM Cooperation

Most viewed papers (last 30 days)

Runtime Specialization for Heterogeneous CPU-GPU Platforms

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)