C-DAC’s Efforts – Application Kernels on HPC Cluster with GPU Accelerators

VCV.Rao, Nisha Agrawa, Samrit Maity
HPC Frontier Technologies, Exploration Group, C-DAC, Pune University Campus, Pune 411 007, Maharashtra, India
ATIP – A*CRC Workshop on Accelerator Technologies in High Performance Computing, 2012


   title={C-DAC’s Efforts – Application Kernels on HPC Cluster with GPU Accelerators},

   author={Rao, VCV. and Agrawa, Nisha and Maity, Samrit},



Download Download (PDF)   View View   Source Source   



We describe the problem of parallelization of finite difference method (FDM) and finite element method (FEM) computations for certain class of partial differential equations (PDEs) on High Performance Computing (HPC) GPU cluster. For FDM, the structured grids have been employed and optimal data rearrangement operations are performed in GPU computations. For FEM, unstructured triangular and hexahedral meshes are generated and graph partitioning METIS [14] software is used to generate load-balanced sub-domains. The iterative methods have been used to solve result algebraic matrix system of linear equations. A combination of MPI with CUDA and OpenCL enabled NVIDIA as well as OpenCL based AMD-ATI GPUs of HPC GPU Cluster have been used in our experiments [4,6,7,8]. Our experiments indicate that the MPI-CUDA codes based on FDM and FEM achieves nearly 6x speed-ups for large mesh sizes in comparison to host-cpu implementation of the same code. The un-optimized OpenCL implementation GPU times have shown marginal improvement in speed-ups whereas counterpart the CUDA codes achieved maximum speedup of 4x to 6x on HPC GPU Cluster. We presented performance analysis for different mesh sizes that prove performance capabilities of performance and scalability of FDM and FEM computations GPU cluster.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: