27159

Towards Understanding and Mitigating Memory-Access Challenges in Computing Systems

Mohammad K. Dashti
The University of British Columbia
The University of British Columbia, 2022

@phdthesis{dashti2022towards,

   title={Towards understanding and mitigating memory-access challenges in computing systems},

   author={Dashti, Mohammad K},

   year={2022},

   school={University of British Columbia}

}

In an era of hardware diversity, the management of applications’ allocated memory is a complex task that can have significant performance repercussions. Non-uniformity in the memory hierarchy, along with heterogeneity and asymmetry of chip designs, make the costs of memory accesses unpredictable if the allocated memory is not managed carefully. Poor memory allocation and placement on symmetric, non-uniform memory access (NUMA) server systems can cause interconnect link congestion and memory controller contention, which can drastically impact performance. Furthermore, asymmetric systems consisting of CPU and GPU cores suffer similarly depending on how memory is allocated and what types of cores are accessing it. Furthermore, modern chip designs are integrating compute units into the memory modules to bring compute closer to memory and eliminate the high costs of transferring data. Hence, accessing memory efficiently is becoming increasingly challenging as systems evolve toward heterogeneity. Our contribution is a detailed analysis and insight into software and hardware memory management techniques on modern symmetric server-class and asymmetric heterogeneous processors consisting of integrated CPU and GPU cores, and to recently commercially available Processing In Memory (PIM) systems. In particular, we examine three types of systems that are affected by memory access bottlenecks. First, we look at NUMA systems, then integrated CPU-GPU systems, and finally PIM systems. We analyze these systems and suggest possible solutions to mitigate memory access issues. For NUMA systems, we propose a holistic memory management algorithm that intelligently distributes memory pages to reduce congestion and improve performance. Then, we provide a detailed analysis of memory management methods on integrated CPU-GPU systems focusing on performance and functionality trade-offs. Our goal is to expose the performance impact of memory management techniques and assess the viability and advantage of running applications with complex behaviors on such integrated CPU-GPU systems. Finally, we examine PIM systems with a case study of image decoding, which is a critical stage for many deep learning applications. We show that the extreme compute scalability of PIM systems can be utilized to accelerate image decoding with performance potential that can surpass CPUs and GPUs.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: