high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Electrodynamics » Fast evaluation of Helmholtz potential on graphics processing units (GPUs)

Fast evaluation of Helmholtz potential on graphics processing units (GPUs)

Shaojing Li, Boris Livshitz, Vitaliy Lomakin

Department of Electrical and Computer Engineering, University of California, San Diego, United States

Journal of Computational Physics, Vol. 229, No. 22. (01 November 2010), pp. 8463-8483.

DOI:10.1016/j.jcp.2010.07.029

BibTeX

Source

1804

views

This paper presents a parallel algorithm implemented on graphics processing units (GPUs) for rapidly evaluating spatial convolutions between the Helmholtz potential and a large-scale source distribution. The algorithm implements a non-uniform grid interpolation method (NGIM), which uses amplitude and phase compensation and spatial interpolation from a sparse grid to compute the field outside a source domain. NGIM reduces the computational time cost of the direct field evaluation at N observers due to N co-located sources from O(N2) to O(N) in the static and low-frequency regimes, to O(NlogN) in the high-frequency regime, and between these costs in the mixed-frequency regime. Memory requirements scale as O(N) in all frequency regimes. Several important differences between CPU and GPU implementations of the NGIM are required to result in optimal performance on respective platforms. In particular, in the CPU implementations all operations, where possible, are pre-computed and stored in memory in a preprocessing stage. This reduces the computational time but significantly increases the memory consumption. In the GPU implementations, where handling memory often is a critical bottle neck, several special memory handling techniques are used to accelerate the computations. A significant latency of the GPU global memory access is hidden by implementing coalesced reading, which requires arranging many array elements in contiguous parts of memory. Contrary to the CPU version, most of the steps in the GPU implementations are executed on-fly and only necessary arrays are kept in memory. This results in significantly reduced memory consumption, increased problem size N that can be handled, and reduced computational time on GPUs. The obtained GPU-CPU speed-up ratios are from 150 to 400 depending on the required accuracy and problem size. The presented method and its CPU and GPU implementations can find important applications in various fields of physics and engineering.

Tags: Electrodynamics, Integral equations

November 18, 2010 by hgpu

No votes yet.

Please wait...

high performance computing on graphics processing units: hgpu.org

Fast evaluation of Helmholtz potential on graphics processing units (GPUs)

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Fast evaluation of Helmholtz potential on graphics processing units (GPUs)

Share this:

Recent source codes

Most viewed papers (last 30 days)