high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Affine Vector Cache for memory bandwidth savings

Affine Vector Cache for memory bandwidth savings

Sylvain Collange, Alexandre Kouyoumdjian

ARENAIRE (Inria Grenoble Rhone-Alpes / LIP Laboratoire de l’Informatique du Parallelisme), INRIA – CNRS : UMR5668 – Universite Claude Bernard – Lyon I – Ecole Normale Superieure de Lyon

HAL – Inria, Report ensl-00649200, 2011

@article{collange2011affine,

title={Affine Vector Cache for memory bandwidth savings},

author={Collange, S. and Kouyoumdjian, A. and others},

year={2011}

}

Download (PDF)

View

Source

1500

views

Preserving memory locality is a major issue in highly-multithreaded architectures such as GPUs. These architectures hide latency by maintaining a large number of threads in flight. As each thread needs to maintain a private working set, all threads collectively put tremendous pressure on on-chip memory arrays, at significant cost in area and power. We show that thread-private data in GPU-like implicit SIMD architectures can be compressed by a factor up to 16 by taking advantage of correlations between values held by different threads. We propose the Affine Vector Cache, a compressed cache design that complements the first level cache. Evaluation by simulation on the SDK and Rodinia benchmarks shows that a 32KB L1 cache assisted by a 16KB AVC presents a 59% larger usable capacity on average compared to a single 48KB L1 cache. It results in a global performance increase of 5.7% along with an energy reduction of 11% for a negligible hardware cost.

Tags: Benchmarking, Computer science, CUDA, Memory model, nVidia, Tesla

December 16, 2011 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Affine Vector Cache for memory bandwidth savings

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Affine Vector Cache for memory bandwidth savings

Share this:

Recent source codes

Most viewed papers (last 30 days)