25519

LocalityGuru: A PTX Analyzer for Extracting Thread Block-level Locality in GPGPUs

Devashree Tripathy, AmirAli Abdolrashidi, Quan Fan, Daniel Wong, Manoranjan Satpathy
University of California, Riverside, CA, USA
15th IEEE International Conference on Networking, Architecture, and Storage (NAS), 2021

@article{tripathy2021localityguru,

   title={LocalityGuru: A PTX Analyzer for Extracting Thread Block-level Locality in GPGPUs},

   author={Tripathy, Devashree and Abdolrashidi, AmirAli and Fan, Quan and Wong, Daniel and Satpathy, Manoranjan},

   year={2021}

}

Download Download (PDF)   View View   Source Source   

968

views

Exploiting data locality in GPGPUs is critical for efficiently using the smaller data caches and handling the memory bottleneck problem. This paper proposes a thread block-centric locality analysis, which identifies the locality among the thread blocks (TBs) in terms of a number of common data references. In LocalityGuru, we seek to employ a detailed just-in-time (JIT) compilation analysis of the static memory accesses in the source code and derive the mapping between the threads and data indices at kernel-launch-time. Our locality analysis technique can be employed at multiple granularities such as threads, warps, and thread blocks in a GPU Kernel. This information can be leveraged to help make smarter decisions for locality-aware data-partition, memory page data placement, cache management, and scheduling in single-GPU and multi-GPU systems. The results of the LocalityGuru PTX analyzer are then validated by comparing with the Locality graph obtained through profiling. Since the entire analysis is carried out by the compiler before the kernel launch time, it does not introduce any timing overhead to the kernel execution time.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: