high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » C-DAC’s Efforts – Application Kernels on HPC Cluster with GPU Accelerators

C-DAC’s Efforts – Application Kernels on HPC Cluster with GPU Accelerators

VCV.Rao, Nisha Agrawa, Samrit Maity

HPC Frontier Technologies, Exploration Group, C-DAC, Pune University Campus, Pune 411 007, Maharashtra, India

ATIP – A*CRC Workshop on Accelerator Technologies in High Performance Computing, 2012

@article{rao2012cdac,

title={C-DAC’s Efforts – Application Kernels on HPC Cluster with GPU Accelerators},

author={Rao, VCV. and Agrawa, Nisha and Maity, Samrit},

year={2012}

}

Download (PDF)

View

Source

2920

views

We describe the problem of parallelization of finite difference method (FDM) and finite element method (FEM) computations for certain class of partial differential equations (PDEs) on High Performance Computing (HPC) GPU cluster. For FDM, the structured grids have been employed and optimal data rearrangement operations are performed in GPU computations. For FEM, unstructured triangular and hexahedral meshes are generated and graph partitioning METIS [14] software is used to generate load-balanced sub-domains. The iterative methods have been used to solve result algebraic matrix system of linear equations. A combination of MPI with CUDA and OpenCL enabled NVIDIA as well as OpenCL based AMD-ATI GPUs of HPC GPU Cluster have been used in our experiments [4,6,7,8]. Our experiments indicate that the MPI-CUDA codes based on FDM and FEM achieves nearly 6x speed-ups for large mesh sizes in comparison to host-cpu implementation of the same code. The un-optimized OpenCL implementation GPU times have shown marginal improvement in speed-ups whereas counterpart the CUDA codes achieved maximum speedup of 4x to 6x on HPC GPU Cluster. We presented performance analysis for different mesh sizes that prove performance capabilities of performance and scalability of FDM and FEM computations GPU cluster.

Tags: AMD FirePro V5900, AMD FirePro V7900, AMD FireStream 9350, ATI, Computer science, CUDA, Differential equations, FEM, Finite difference, Finite element method, GPU cluster, Heterogeneous systems, Linear Algebra, MPI, nVidia, OpenCL, Partial differential equations, PDEs, Tesla C2060

May 19, 2012 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

C-DAC’s Efforts – Application Kernels on HPC Cluster with GPU Accelerators

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

C-DAC’s Efforts – Application Kernels on HPC Cluster with GPU Accelerators

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)