high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » A Task-centric Memory Model for Scalable Accelerator Architectures

A Task-centric Memory Model for Scalable Accelerator Architectures

John H. Kelm, Daniel R. Johnson, Steven S. Lumetta, Matthew I. Frank, Sanjay J. Patel

University of Illinois at Urbana-Champaign, Urbana, IL 61801

18th International Conference on Parallel Architectures and Compilation Techniques, 2009, PACT 2009, pp.77-87

DOI:10.1109/PACT.2009.16

@conference{kelm2009task,

title={A task-centric memory model for scalable accelerator architectures},

author={Kelm, J.H. and Johnson, D.R. and Lumetta, S.S. and Frank, M.I. and Patel, S.J.},

booktitle={2009 18th International Conference on Parallel Architectures and Compilation Techniques},

pages={77–87},

issn={1089-795X},

year={2009},

organization={IEEE}

}

Download (PDF)

View

Source

1242

views

This paper presents a task-centric memory model for 1000-core compute accelerators. Visual computing applications are emerging as an important class of workloads that can exploit 1000-core processors. In these workloads, we observe data sharing and communication patterns that can be leveraged in the design of memory systems for future 1000-core processors. Based on these insights, we propose a memory model that uses a software protocol, working in collaboration with hardware caches, to maintain a coherent, single-address space view of memory without the need for hardware coherence support. We evaluate the task-centric memory model in simulation on a 1024-core MIMD accelerator we are developing that, with the help of a runtime system, implements the proposed memory model. We evaluate coherence management policies related to the task-centric memory model and show that the overhead of maintaining a coherent view of memory in software can be minimal. We further show that, while software management may constrain speculative hardware prefetching into local caches, a common optimization, it does not constrain the more relevant use case of off-chip prefetching from DRAM into shared caches.

Tags: Algorithms, Computer science, Hardware Architecture, Memory model

April 19, 2011 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

A Task-centric Memory Model for Scalable Accelerator Architectures

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

A Task-centric Memory Model for Scalable Accelerator Architectures

Share this:

Recent source codes

Most viewed papers (last 30 days)