high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » CuMAPz: a tool to analyze memory access patterns in CUDA

CuMAPz: a tool to analyze memory access patterns in CUDA

Yooseong Kim, Aviral Shrivastava

Compiler and Microarchitecture Laboratory, Arizona State University, Tempe 85281, USA

Proceedings of the 48th Design Automation Conference, DAC ’11, 2011

DOI:10.1145/2024724.2024754

BibTeX

Download (PDF)

View

Source

1914

views

CUDA programming model provides a simple interface to program on GPUs, but tuning GPGPU applications for high performance is still quite challenging. Programmers need to consider several architectural details, and small changes in source code, especially on memory access pattern, affect performance significantly. This paper presents CuMAPz, a tool to compare the memory performance of a CUDA program. CuMAPz can help programmers explore different ways of using shared and global memories, and optimize their program for memory behavior. CuMAPz models several memory effects, e.g., data reuse, global memory access coalescing, shared memory bank conflict, channel skew, and branch divergence. By using CuMAPz to explore memory access design space, we could improve the performance of our benchmarks by 62% over the naive cases, and 32% over previous approach[8].

Tags: Analytical model, Benchmarking, Computer science, CUDA, Memory, nVidia, Performance, Tesla C1060

September 20, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

CuMAPz: a tool to analyze memory access patterns in CUDA

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

CuMAPz: a tool to analyze memory access patterns in CUDA

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)