Program Optimization Strategies for Data-Parallel Many-Core Processors

hgpu.org » Applications » Computer science » Program Optimization Strategies for Data-Parallel Many-Core Processors

Program Optimization Strategies for Data-Parallel Many-Core Processors

Shane Ryoo

University of Illinois at Urbana-Champaign

University of Illinois at Urbana-Champaign, 2008

@phdthesis{ryoo2008program,

title={Program optimization strategies for data-parallel many-core processors},

author={Ryoo, S.},

year={2008},

publisher={Citeseer}

}

Download (PDF)

View

Source

1318

views

Program optimization for highly parallel systems has historically been considered an art, with experts doing much of the performance tuning by hand. With the introduction of inexpensive, single-chip, massively parallel platforms, more developers will be creating highly data-parallel applications for these platforms while lacking the substantial experience and knowledge needed to maximize application performance. In addition, hand-optimization even by motivated and informed developers takes a significant amount of time and generally still underutilizes the performance of the hardware by double-digit percentages. This creates a need for structured and automatable optimization techniques that are capable of finding a near-optimal program configuration for this new class of architecture. My work discusses various strategies for optimizing programs on a highly data-parallel architecture with fine-grained sharing of resources. I first investigate useful strategies in optimizing a suite of applications. I then introduce program optimization carving, an approach that discovers high-performance application configurations for data-parallel, many-core architectures. Instead of applying a particular phase ordering of optimizations, it starts with an optimization space of major transformations and then reduces the space by examining the static code and pruning configurations that do not maximize desirable qualities in isolation or combination. Careful selection of pruning criteria for applications running on the NVIDIA GeForce 8800 GTX reduces the optimization space by as much as 98% while finding configurations within 1% of the best performance. Random sampling, in contrast, can require nearly five times as many configurations to find performance within 10% of the best. I also examine the technique’s effectiveness when varying pruning criteria.

Tags: Computer science, CUDA, nVidia, nVidia GeForce 8800 GTX, Optimization, Thesis

January 24, 2011 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org