high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » The Hierarchical Memory Machine Model for GPUs

The Hierarchical Memory Machine Model for GPUs

Koji Nakano

Department of Information Engineering, Hiroshima University

International Parallel and Distributed Processing Symposium Workshops, 2013

BibTeX

Download (PDF)

View

Source

1737

views

The Discrete Memory Machine (DMM) and the Unified Memory Machine (UMM) are theoretical parallel computing models that capture the essence of the shared memory access and the global memory access of GPUs. The main contribution of this paper is to introduce the Hierarchical Memory Machine (HMM), which consists of multiple DMMs and a single UMM. The HMM is a more practical parallel computing model which reflects the architecture of current GPUs. We present several fundamental algorithms on the HMM. First, we show that the sum of numbers can be computed in O(n/w+nl/p+l+logn) time units using p threads on the HMM with width w and latency l, and prove that this computing time is optimal. We also show that the direct convolution of m and (m+n-1) numbers can be done in O(n/w+mn/(dw)+nl/p+l+logm) time units using p threads on the HMM with d DMMs, width, and latency l. Finally, we prove that our implementation of the direct convolution is time optimal.

Tags: Algorithms, Computer science, CUDA, Memory model, nVidia, nVidia GeForce GTX 580

June 12, 2013 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

The Hierarchical Memory Machine Model for GPUs

Your response

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)

The Hierarchical Memory Machine Model for GPUs

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)