18564

Non-Uniform Domain Decomposition for Heterogeneous Accelerated Processing Units

Gabriel Freytag, Philippe O. A. Navaux, Joao V. F. Lima, Lucas Mello Schnorr, Paolo Rech
Universidade Federal do Rio Grande do Sul, Porto Alegre, 9500, RS, Brazil
13th International Meeting on High Performance Computing for Computational Science (VECPAR), 2018

@article{freytag2018non,

   title={Non-Uniform Domain Decomposition for Heterogeneous Accelerated Processing Units},

   author={Freytag, Gabriel and Navaux, Philippe OA and Lima, Joao VF and Mello, Lucas},

   year={2018}

}

Download Download (PDF)   View View   Source Source   

361

views

The use of heterogeneous architectures has become indispensable in optimizing application performance. Nowadays, one of the most popular heterogeneous architectures is discrete CPU+GPU. Despite the high computational power present in such architectures, in many cases, memory data transfers between CPU and GPU are significant performance bottlenecks. As an attempt to mitigate performance costs involved in data transfers, chip-makers started to integrate CPU and GPU cores in the same fabric sharing the same main memory but with different memory address spaces in architectures denominated APUs (Accelerated Processing Unit). To efficiently exploit heterogeneous CPU+GPU architectures it is needed to split the data so that both processing units (PUs) can perform the computations in parallel. Although this approach results in significant performance improvements, some applications can also be functionality split, as is the case of the Lattice-Boltzmann Method (LBM). In this work, we evaluate the performance of each kernel resulting from the functional decomposition of an OpenCL Lattice-Boltzmann method implementation using non-uniform domain decomposition between CPU and GPU on an APU to better understand the performance impact of different non-uniform domain decompositions between CPU and GPU on each kernel. The experimental results performed on an AMD APU A10-7870K show that uniform domain decompositions between each kernel on the same PU but non-uniform domain decompositions between CPU and GPU affect each kernel differently. These results suggest that non-uniform domain decompositions between each kernel on the same PU and not only between the different PUs can improve even more the performance of the application.
Rating: 5.0/5. From 1 vote.
Please wait...

* * *

* * *

HGPU group © 2010-2018 hgpu.org

All rights belong to the respective authors

Contact us: