GPU Remote Memory Access Programming

Jeremia Bar
Scalable Parallel Computing Laboratory, Department of Computer Science, ETH Zurich
ETH Zurich, 2015


   title={GPU Remote Memory Access Programming},

   author={B{"a}r, Jeremia},


   school={Master Thesis}


Download Download (PDF)   View View   Source Source   



High performance computing studies the construction and programming of computing system with tremendous computational power playing a key role in scientific computing and research across disciplines. The graphics processing unit (GPU) developed for fast 2D and 3D visualizations has turned into a programmable general purpose accelerator device boosting today’s high performance clusters. Leveraging these computational resources requires good programming model abstractions to manage system complexity. Today’s state of the art employs separate cluster communication and GPU computation models such as MPI and CUDA. The bulk-synchronous nature of CUDA and the role of the GPU as a CPU-dependent co-processor limits cluster utilization. In this master thesis we devise, implement, and evaluate the novel GPU cluster programming model GPU RMA addressing three key areas. GPU RMA introduces a simplifying abstraction exposing the cluster as a set of interacting ranks. The ranks execute on the GPU removing the complexity of explicit GPU management from the host. GPU RMA introduces communication amongst ranks on local and remote devices allowing more fine-grained communication and synchronization compared to conventional models increasing concurrency and resource usage. GPU RMA implements one-sided notified access communication leveraging the recently introduced RDMA support for GPUs providing richer communication semantics than current solutions. We implemented GPU RMA on a cluster with Intel CPUs, Nvidia GPUs, and an InfiniBand interconnect, studied the impact of system components to communication latency, and derived an empirical performance model. We performed a case study to assess GPU RMA performance compared to the established MPI and CUDA approach and validated our performance model. Our experiments show GPU RMA improves performance up to 25% compared to the state of the art and encourages further study of GPU RMA performance and scalability.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: