14247

Characterizing and Optimizing Irregular Applications on Graphics Processing Units

Tao Zhang
The University of New Mexico, Albuquerque, New Mexico
The University of New Mexico, 2015

@article{zhang2015characterizing,

   title={Characterizing and Optimizing Irregular Applications on Graphics Processing Units},

   author={Zhang, Tao},

   year={2015}

}

Download Download (PDF)   View View   Source Source   

1791

views

In recent years, GPGPUs have experienced tremendous growth as general-purpose and high-throughput computing devices. Applications from various domains achieve significant speedups using GPGPUs. However, irregular applications do not perform well due to the mismatches between irregular problem structures and SIMD-like GPU architectures. The lack of in-depth characterization and quantifying the ways in which irregular applications differ from regular ones on GPGPUs has prevented users from effectively making use of the hardware resource. To characterize the performance aspects and analyze the bottlenecks, a suite of representative irregular applications are examined on a cycle-accurate GPU simulator as well as a real GPU. The experimental results identify control-flow divergences, irregular memory accesses and load imbalances as major issues degrading performance. Therefore, software and hardware approaches are proposed to improve the performance of irregular applications on GPUs. Specifically, a task-pool software approach is designed to load balancing irregular applications. To facilitate the application of this approach, an open-source library called CUIRRE is proposed which can characterize and load balance general irregular applications through convenient APIs. CUIRRE performs online load balancing & profiling and offline analyzing for the tradeoff of performance and accuracy. In addition, a gGraph platform for graph processing is designed and implemented based on the task-pool approach. GGraph can handle the irregularity of graph data and/or graph algorithms on GPUs, and exploits full parallelism and full overlap of CPU processing and I/O processing as much as possible. Finally, several general irregular applications and three graph algorithms are optimized using the CUIRRE library and the gGraph platform respectively, achieving remarkable speedups.
Rating: 2.5/5. From 1 vote.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: