high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Parallelizing Alternating Direction Implicit Solver on GPUs

Parallelizing Alternating Direction Implicit Solver on GPUs

Zhangping Wei, Byunghyun Jang, Yaoxin Zhang, Yafei Jia

National Center for Computational Hydroscience & Engineering, The University of Mississippi, University, MS 38677 U.S.A.

Procedia Computer Science, Volume 18, Pages 389-398, 2013

DOI:10.1016/j.procs.2013.05.202

BibTeX

Download (PDF)

View

Source

1696

views

We present a parallel Alternating Direction Implicit (ADI) solver on GPUs. Our implementation significantly improves ex- isting implementations in two aspects. First, we address the scalability issue of existing Parallel Cyclic Reduction (PCR) implementations by eliminating their hardware resource constraints. As a result, our parallel ADI, which is based on PCR, no longer has the maximum domain size limitation. Second, we optimize inefficient data accesses of parallel ADI solver by leveraging hardware texture memory and matrix transpose techniques. These memory optimizations further make already parallelized ADI solver twice faster, achieving overall more than 100 times speedup over a highly optimized CPU version. We also present the analysis of numerical accuracy of the proposed parallel ADI solver.

Tags: Computer science, CUDA, nVidia, nVidia GeForce GTX 680, Performance

November 6, 2013 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Parallelizing Alternating Direction Implicit Solver on GPUs

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Parallelizing Alternating Direction Implicit Solver on GPUs

Share this:

Recent source codes

Most viewed papers (last 30 days)