high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » C-DAC’s Efforts – Application Kernels on HPC Cluster with GPU Accelerators

C-DAC’s Efforts – Application Kernels on HPC Cluster with GPU Accelerators

VCV.Rao, Nisha Agrawa, Samrit Maity

HPC Frontier Technologies, Exploration Group, C-DAC, Pune University Campus, Pune 411 007, Maharashtra, India

ATIP – A*CRC Workshop on Accelerator Technologies in High Performance Computing, 2012

@article{rao2012cdac,

title={C-DAC’s Efforts – Application Kernels on HPC Cluster with GPU Accelerators},

author={Rao, VCV. and Agrawa, Nisha and Maity, Samrit},

year={2012}

}

Download (PDF)

View

Source

2150

views

We describe the problem of parallelization of finite difference method (FDM) and finite element method (FEM) computations for certain class of partial differential equations (PDEs) on High Performance Computing (HPC) GPU cluster. For FDM, the structured grids have been employed and optimal data rearrangement operations are performed in GPU computations. For FEM, unstructured triangular and hexahedral meshes are generated and graph partitioning METIS [14] software is used to generate load-balanced sub-domains. The iterative methods have been used to solve result algebraic matrix system of linear equations. A combination of MPI with CUDA and OpenCL enabled NVIDIA as well as OpenCL based AMD-ATI GPUs of HPC GPU Cluster have been used in our experiments [4,6,7,8]. Our experiments indicate that the MPI-CUDA codes based on FDM and FEM achieves nearly 6x speed-ups for large mesh sizes in comparison to host-cpu implementation of the same code. The un-optimized OpenCL implementation GPU times have shown marginal improvement in speed-ups whereas counterpart the CUDA codes achieved maximum speedup of 4x to 6x on HPC GPU Cluster. We presented performance analysis for different mesh sizes that prove performance capabilities of performance and scalability of FDM and FEM computations GPU cluster.

Tags: AMD FirePro V5900, AMD FirePro V7900, AMD FireStream 9350, ATI, Computer science, CUDA, Differential equations, FEM, Finite difference, Finite element method, GPU cluster, Heterogeneous systems, Linear Algebra, MPI, nVidia, OpenCL, Partial differential equations, PDEs, Tesla C2060

May 19, 2012 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

C-DAC’s Efforts – Application Kernels on HPC Cluster with GPU Accelerators

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

C-DAC’s Efforts – Application Kernels on HPC Cluster with GPU Accelerators

Share this:

Recent source codes

Most viewed papers (last 30 days)