high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Exploiting More Parallelism from Applications Having Generalized Reductions on GPU Architectures

Exploiting More Parallelism from Applications Having Generalized Reductions on GPU Architectures

Xiao-Long Wu, Nady Obeid, Wen-Mei Hwu

Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign

10th IEEE International Conference on Computer and Information Technology, 2010

DOI:10.1109/CIT.2010.213

@conference{wu2010exploiting,

title={Exploiting More Parallelism from Applications Having Generalized Reductions on GPU Architectures},

author={Wu, X.L. and Obeid, N. and Hwu, W.M.},

booktitle={2010 10th IEEE International Conference on Computer and Information Technology (CIT 2010)},

pages={1175–1180},

year={2010},

organization={IEEE}

}

Download (PDF)

View

Source

1717

views

Reduction is a common component of many applications, but can often be the limiting factor for parallelization. Previous reduction work has focused on detecting reduction idioms and parallelizing the reduction operation by minimizing data communications or exploiting more data locality. While these techniques can be useful, they are mostly limited to simple code structures. In this paper, we propose a method for exploiting more parallelism by isolating the reduction from users of the intermediate results. The other main contribution of our work is enabling the parallelization of more complex reduction codes, including those that involve the use of intermediate reduction results. The proposed transformations are often implemented by programmers in an ad-hoc manner, but to the best of our knowledge no previous work has been proposed to automate these transformations for many-core architectures. We show that the automatic transformations can result in significant speedup compared to the original code using two benchmark applications.

Tags: Algorithms, Computer science, CUDA, nVidia, Optimization

January 24, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Exploiting More Parallelism from Applications Having Generalized Reductions on GPU Architectures

Your response

Recent source codes

True 4-Bit Quantized CNN Training on CPU

cuFuzz: A GPU-oriented coverage-guided fuzzer for userland CUDA application

KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization

MSKernelBench & CUDAMaster

EvoScientist: Harness Vibe Research with Self-evolving AI Scientists

RepoLaunch: Automating Build and Test Pipeline of Code Repositories on ANY Language and ANY Platform

RepoLaunch: Automating Build and Test Pipeline of Code Repositories on ANY Language and ANY Platform

CONCUR: a benchmark designed to evaluate multithreaded Java code generated by LLMs

HIPRT: Ray Tracing using HIP

MXFP4 Training Support Codebase

Most viewed papers (last 30 days)

Exploiting More Parallelism from Applications Having Generalized Reductions on GPU Architectures

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)