high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » A Tuning Framework for Software-Managed Memory Hierarchies

A Tuning Framework for Software-Managed Memory Hierarchies

Manman Ren, Ji Young Park, Mike Houston, Alex Aiken, William J. Dally

Stanford University

Proceedings of the 17th international conference on Parallel architectures and compilation techniques, 2008, p.280-291

BibTeX

Download (PDF)

View

Source

2455

views

New architectures are emerging at a rapid pace, architectures with multiple processing units on a chip and with deep memory hierarchies have become pervasive; while architectures with software-managed memory hierarchies (such as the Sony/Toshiba/IBM Cell processor) have gained popularity. Due to the increased complexity of architectures, re-targeting a legacy application to a new architecture requires lots of time porting and tuning. To achieve both portability and high performance on modern machines, we propose a programming environment that includes a portable language (Sequoia), a portable runtime and a tuning framework. In this thesis, we focus on the design and implementation of the tuning framework. Achieving good performance on a modern machine with a multi-level memory hierarchy, and in particular on a machine with software-managed memories, requires the meticulous tuning of programs to the machine’s particular characteristics. Further, the choices made when tuning a program for one machine will typically be very different to those made when tuning the same program for a different machine. A large program on a multi-level machine can easily expose tens or hundreds of inter-dependent parameters which require tuning, ranging (for example) from subarray sizes to compiler flags to loop optimizations to decomposition strategies, and manually searching the resultant large, non-linear space of program parameters is a tedious process of trial-and-error. These challenges entail the design of an automatic tuning framework. In this dissertation, we present a general framework for automatically tuning arbitrary applications to machines with software-managed memory hierarchies. The tuning framework matches the decomposition strategies to the memory hierarchies. It uses a search algorithm, I specialized to software-managed memory hierarchies, that achieves good performance quickly due to the smoothness of the search space. The framework also applies a novel fusion algorithm that considers multiple outermost loop levels in a single step. The knowledge learned when searching the tunable space is used to guide the selection of a fusion configuration. We evaluate our framework by measuring the performance of benchmarks that are tuned for a range of machines with different memory hierarchy configurations: a cluster of Intel P4 Xeon processors, a single Cell processor and a cluster of Sony Playstation 3s. The tuning framework gives similar or better performance than what is achieved by the best-available hand-tuned version coded in Sequoia.

Tags: Algorithms, Cell processor, Compilers, Computer science, Optimization, Performance, Playstation, Programming Languages

February 26, 2011 by hgpu

No votes yet.

Please wait...

high performance computing on graphics processing units: hgpu.org

A Tuning Framework for Software-Managed Memory Hierarchies

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

A Tuning Framework for Software-Managed Memory Hierarchies

Share this:

Recent source codes

Most viewed papers (last 30 days)