high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories

Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories

Muthu Manikandan Baskaran, Uday Bondhugula, Sriram Krishnamoorthy, J. Ramanujam, Atanas Rountev, P. Sadayappan

Department of Computer Science and Engineering, The Ohio State University, 2015 Neil Ave. Columbus, OH, USA

In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming (2008), pp. 1-10

DOI:10.1145/1345206.1345210

@conference{baskaran2008automatic,

title={Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories},

author={Baskaran, M.M. and Bondhugula, U. and Krishnamoorthy, S. and Ramanujam, J. and Rountev, A. and Sadayappan, P.},

booktitle={Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming},

pages={1–10},

year={2008},

organization={ACM}

}

Download (PDF)

View

Source

1392

views

Several parallel architectures such as GPUs and the Cell processor have fast explicitly managed on-chip memories, in addition to slow off-chip memory. They also have very high computational power with multiple levels of parallelism. A significant challenge in programming these architectures is to effectively exploit the parallelism available in the architecture and manage the fast memories to maximize performance. In this paper we develop an approach to effective automatic data management for on-chip memories, including creation of buffers in on-chip (local) memories for holding portions of data accessed in a computational block, automatic determination of array access functions of local buffer references, and generation of code that moves data between slow off-chip memory and fast local memories. We also address the problem of mapping computation in regular programs to multi-level parallel architectures using a multi-level tiling approach, and study the impact of on-chip memory availability on the selection of tile sizes at various levels. Experimental results on a GPU demonstrate the effectiveness of the proposed approach.

Tags: Algorithms, Code generation, Compilers, Computer science, CUDA, nVidia, nVidia GeForce 8800 GTX, Optimization, Performance

February 28, 2011 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories

Share this:

Recent source codes

Most viewed papers (last 30 days)