high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » An Energy Efficient GPGPU Memory Hierarchy with Tiny Incoherent Caches

An Energy Efficient GPGPU Memory Hierarchy with Tiny Incoherent Caches

Alamelu Sankaranarayanan, Ehsan K. Ardestani, Jose Luis Briz, Jose Renau

Dept. of Computer Engineering, University of California Santa Cruz

International Symposium on Low Power Electronics and Design (ISLPED), 2013

BibTeX

Download (PDF)

View

Source

2094

views

With progressive generations and the ever-increasing promise of computing power, GPGPUs have been quickly growing in size, and at the same time, energy consumption has become a major bottleneck for them. The first level data cache and the scratchpad memory are critical to the performance of a GPGPU, but they are extremely energy inefficient due to the large number of cores they need to serve. This problem could be mitigated by introducing a cache higher up in hierarchy that services fewer cores, but this introduces cache coherency issues that may become very significant, especially for a GPGPU with hundreds of thousands of in-flight threads. In this paper, we propose adding incoherent tinyCaches between each lane in an SM, and the first level data cache that is currently shared by all the lanes in an SM. In a normal multiprocessor, this would require hardware cache coherence between all the SM lanes capable of handling hundreds of thousands of threads. Our incoherent tinyCache architecture exploits certain unique features of the CUDA/OpenCL programming model to avoid complex coherence schemes. This tinyCache is able to filter out 62% of memory requests that would otherwise need to be serviced by the DL1G, and almost 81% of scratchpad memory requests, allowing us to achieve a 37% energy reduction in the on-chip memory hierarchy. We evaluate the tinyCache for different memory patterns and show that it is beneficial in most cases.

Tags: Computer science, CUDA, Energy-efficient computing, GPGPU-sim, Memory, nVidia, OpenCL

June 24, 2013 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

An Energy Efficient GPGPU Memory Hierarchy with Tiny Incoherent Caches

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

An Energy Efficient GPGPU Memory Hierarchy with Tiny Incoherent Caches

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)