high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations

Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations

Vignesh T. Ravi, Wenjing Ma, David Chiu, Gagan Agrawal

Department of Computer Science and Engineering, The Ohio State University Columbus OH 43210

Proceedings of the 24th ACM International Conference on Supercomputing, ICS ’10, 2010

DOI:10.1145/1810085.1810106

@inproceedings{ravi2010compiler,

title={Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations},

author={Ravi, V.T. and Ma, W. and Chiu, D. and Agrawal, G.},

booktitle={Proceedings of the 24th ACM International Conference on Supercomputing},

pages={137–146},

year={2010},

organization={ACM}

}

Download (PDF)

View

Source

2175

views

A trend that has materialized, and has given rise to much attention, is of the increasingly heterogeneous computing platforms. Presently, it has become very common for a desktop or a notebook computer to come equipped with both a multi-core CPU and a GPU. Capitalizing on the maximum computational power of such architectures (i.e., by simultaneously exploiting both the multi-core CPU and the GPU) starting from a high-level API is a critical challenge. We believe that it would be highly desirable to support a simple way for programmers to realize the full potential of today’s heterogeneous machines. This paper describes a compiler and runtime framework that can map a class of applications, namely those characterized by generalized reductions, to a system with a multi-core CPU and GPU. Starting with simple C functions with added annotations, we automatically generate the middleware API code for the multi-core, as well as CUDA code to exploit the GPU simultaneously. The runtime system provides efficient schemes for dynamically partitioning the work between CPU cores and the GPU. Our experimental results from two applications, e.g., k-means clustering and Principal Component Analysis (PCA), show that, through effectively harnessing the heterogeneous architecture, we can achieve significantly higher performance compared to using only the GPU or the multi-core CPU. In k-means, the heterogeneous version with 8 CPU cores and a GPU achieved a speedup of about 32.09x relative to 1-thread CPU. When compared to the faster of CPU-only and GPU-only executions, we were able to achieve a performance gain of about 60%. In PCA, the heterogeneous version attained a speedup of 10.4x relative to the 1-thread CPU version. When compared to the faster of CPU-only and GPU-only versions, we achieved a performance gain of about 63.8%.

Tags: Computer science, CUDA, Heterogeneous systems, nVidia, nVidia GeForce 9800 GTX

August 22, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)