high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Adaptive and Transparent Cache Bypassing for GPUs

Adaptive and Transparent Cache Bypassing for GPUs

Ang Li, Gert-Jan van den Braak, Akash Kumar, Henk Corporaal

Eindhoven University of Technology, Eindhoven, The Netherlands

The International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2015

BibTeX

Download (PDF)

View

Source

2187

views

In the last decade, GPUs have emerged to be widely adopted for general-purpose applications. To capture on-chip locality for these applications, modern GPUs have integrated multilevel cache hierarchy, in an attempt to reduce the amount and latency of the massive and sometimes irregular memory accesses. However, inferior performance is frequently attained due to serious congestion in the caches results from the huge amount of concurrent threads. In this paper, we propose a novel compile-time framework for adaptive and transparent cache bypassing on GPUs. It uses a simple yet effective approach to control the bypass degree to match the size of applications’ runtime footprints. We validate the design on seven GPU platforms that cover all existing GPU generations using 16 applications from widely used GPU benchmarks. Experiments show that our design can significantly mitigate the negative impact due to small cache sizes and improve the overall performance. We analyze the performance across different platforms and applications. We also propose some optimization guidelines on how to efficiently use the GPU caches.

Tags: Benchmarking, cache, Computer science, Memory, nVidia, nVidia GeForce GTX 460, nVidia GeForce GTX 570, nVidia GeForce GTX 690, nVidia GeForce GTX 750 Ti, nVidia GeForce GTX 980, Performance, PTX, Tesla K40, Tesla K80

September 24, 2015 by hgpu

Rating: 2.5/5. From 1 vote.

Please wait...

high performance computing on graphics processing units: hgpu.org

Adaptive and Transparent Cache Bypassing for GPUs

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Adaptive and Transparent Cache Bypassing for GPUs

Share this:

Recent source codes

Most viewed papers (last 30 days)