high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Heterogeneous CPU/(GP) GPU Memory Hierarchy Analysis and Optimization

Heterogeneous CPU/(GP) GPU Memory Hierarchy Analysis and Optimization

Josue Vladimir Quiroga Esparza

Departament d’Arquitectura de Computadors, Universitat Politecnica de Catalunya

Universitat Politecnica de Catalunya, 2015

BibTeX

Download (PDF)

View

Source

2651

views

Heterogeneous systems, more specifically CPU – GPGPU platforms, have gained a lot of attention due to the excellent speedups GPUs can achieve with such little amount of energy consumption. Anyhow, not everything is such a good story, the complex programming models to get the maximum exploitation of the devices and data movement overheads are some of the constraints or challenges in order to get the benefits from GP-GPU computing. On the other hand, architects from big processor manufacturers like Intel and AMD have integrated the CPU’s and GPU’s on the same chip thanks to the "Moore’s Law" but the logical integration has not been as easy as putting them physically together side-by-side on the same die. Fusing these two different kind of cores, each one with its own memory hierarchy: one with higher memory bandwidth due to the throughput, on GPU’s for example, and the CPU’s with multi-level, higher capacity caches using protocols to provide strong consistency models for the programmer less scalable due to the coherency-related traffic. With this, the Heterogeneous System Architecture (HSA) has been developed by the HSA Foundation founded ARM, AMD, Qualcomm and many other companies to reduce latency between devices connectivity, and make this system more compatible from a programmer’s perspective(CUDA or OpenCL), without doing copies on disjoint memories, giving as result a Unified Virtual Memory. Because of the nature of these two separated memory systems, the heterogeneous Uniform Memory Access (hUMA) was created by AMD to share the system’s virtual memory space. The GPU can access CPU memory addresses directly, allowing both reading and writing data that the CPU is also accessing sharing page tables so devices can exchange data by sharing pointers. Great improvements can be achieved by the architecture integration on-chip, but memory wall is always present and a big constraint for systems with a lot of memory bandwidth demands as GPU does. Memory Controllers are the main character in scene to coordinate and schedule all the request of the processor to go to main memory, off chip, taking into account the technology latencies, refreshes, etc. It has too many constraints and to many scheduling possibilities that are impossible to have a general formula to schedule a processor requests to main memory so the flavors vary from processor to processor. In this master thesis, we propose a scheduling re-ordering based on a hysteresis detector to give some fairness and speedup to the memory request threads taking advantage of the bank level parallelism at the memory system organization. First we introduce the evolution of the CPU and GPU processors until their integration in systems and processor using GPU as a general purpose processor. Later we take a closer look to a Memory Controller giving the general perspective and functional elements with a state-of-the-art memory controllers for multicore processors. Given this we show our proposal system for re-ordering with the hysteresis detection and re-ordering logic. Then, the methodology about the simulation infrastructure and benchmarks used is described. The analysis of a baseline processor without memory unification, a fusion processor with virtual memory unification and this same fusion processor with the proposal scheduling for bank parallelism awareness. Conclusions derive from the result at the analysis are stated and the future work.

Tags: Computer science, CUDA, Heterogeneous systems, Memory, nVidia, Thesis

November 4, 2015 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Heterogeneous CPU/(GP) GPU Memory Hierarchy Analysis and Optimization

Your response

Recent source codes

GEAK-agent: LLM-based AI agent, which can write correct and efficient GPU kernels automatically

OpenDwarfs 2025: re-engineered version of the OpenDwarfs benchmark suite, for compatibility with modern platforms

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

Most viewed papers (last 30 days)

Heterogeneous CPU/(GP) GPU Memory Hierarchy Analysis and Optimization

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)