high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » CUDA » Conjugate gradient solvers on Intel Xeon Phi and NVIDIA GPUs

Conjugate gradient solvers on Intel Xeon Phi and NVIDIA GPUs

O. Kaczmarek, C. Schmidt, P. Steinbrecher, M. Wagner

Fakultat fur Physik, Universitat Bielefeld, D-33615 Bielefeld, Germany

arXiv:1411.4439 [physics.comp-ph], (17 Nov 2014)

@{,

}

Download (PDF)

View

Source

1944

views

Lattice Quantum Chromodynamics simulations typically spend most of the runtime in inversions of the Fermion Matrix. This part is therefore frequently optimized for various HPC architectures. Here we compare the performance of the Intel Xeon Phi to current Kepler-based NVIDIA Tesla GPUs running a conjugate gradient solver. By exposing more parallelism to the accelerator through inverting multiple vectors at the same time, we obtain a performance greater than 300 GFlop/s on both architectures. This more than doubles the performance of the inversions. We also give a short overview of the Knights Corner architecture, discuss some details of the implementation and the effort required to obtain the achieved performance.

Tags: Computational Physics, Conjugate gradient solver, CUDA, High Energy Physics – Lattice, Intel Xeon Phi, Mathematical Software, nVidia, Physics, Tesla K20, Tesla K40

November 18, 2014 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Conjugate gradient solvers on Intel Xeon Phi and NVIDIA GPUs

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Conjugate gradient solvers on Intel Xeon Phi and NVIDIA GPUs

Share this:

Recent source codes

Most viewed papers (last 30 days)