25026

Automatically Exploiting the Memory Hierarchy of GPUs through Just-in-Time Compilation

Michail Papadimitriou, Juan Fumero, Athanasios Stratikopoulos, Christos Kotselidis
The University of Manchester, United Kingdom
The 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE’21), 2021

@inproceedings{papadimitriou2021automatically,

   title={Automatically exploiting the memory hierarchy of GPUs through just-in-time compilation},

   author={Papadimitriou, Michail and Fumero, Juan and Stratikopoulos, Athanasios and Kotselidis, Christos},

   booktitle={Proceedings of the 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments},

   pages={57–70},

   year={2021}

}

Download Download (PDF)   View View   Source Source   Source codes Source codes

Package:

1337

views

Although Graphics Processing Units (GPUs) have become pervasive for data-parallel workloads, the efficient exploitation of their tiered memory hierarchy requires explicit programming. The efficient utilization of different GPU memory tiers can yield higher performance at the expense of programmability since developers must have extended knowledge of the architectural details in order to utilize them. In this paper, we propose an alternative approach based on Just-In-Time (JIT) compilation to automatically and transparently exploit local memory allocation and data locality on GPUs. In particular, we present a set of compiler extensions that allow arbitrary Java programs to utilize local memory on GPUs without explicit programming. We prototype and evaluate our proposed solution in the context of TornadoVM against a set of benchmarks and GPU architectures, showcasing performance speedups of up to 2.5x compared to equivalent baseline implementations that do not utilize local memory or data locality. In addition, we compare our proposed solution against hand-written optimized OpenCL code to assess the upper bound of performance improvements that can be transparently achieved by JIT compilation without trading programmability. The results showcase that the proposed extensions can achieve up to 94% of the performance of the native code, highlighting the efficiency of the generated code.
Rating: 5.0/5. From 1 vote.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: