Performance analysis of matrix-free conjugate gradient kernels using SYCL

hgpu.org » Applications » Computer science » Performance analysis of matrix-free conjugate gradient kernels using SYCL

Performance analysis of matrix-free conjugate gradient kernels using SYCL

Igor Baratta, Chris Richardson, Garth Wells

Department of Engineering, University of Cambridge, United Kingdom

International Workshop on OpenCL (IWOCL’22), 2022

DOI:10.1145/3529538.3529993

BibTeX

Download (PDF)

View

Source

Source codes

Package:

Matrix-free CG SYCL benchmarks

1242

views

We examine the performance of matrix-free SYCL implementations of the conjugate gradient method for solving sparse linear systems of equations. Performance is tested on an NVIDIA A100-80GB device and a dual socket Intel Ice Lake CPU node using different SYCL implementations, and compared to CUDA BLAS (cuBLAS) implementations on the A100 GPU and MKL implementations on the CPU node. All considered kernels in the matrix-free implementation are memory bandwidth limited, and a simple performance model is applied to estimate the asymptotic memory bandwidth and the latency. Our experiments show that in most cases the considered SYCL implementations match the asymptotic performance of the reference implementations. However, for smaller but practically relevant problem sizes latency is observed to have a significant impact on performance. For some cases the SYCL latency is reasonably close to the reference (cuBLAS/MKL) implementation latency, but in other cases it is more than one order of magnitude greater. In particular, SYCL built-in reductions on the GPU and all operations for one of the SYCL implementations on the CPU exhibit high latency, and this latency limits performance at problem sizes that can in cases be representative of full application simulations, and can degrade strong scaling performance.

Tags: Benchmarking, Computer science, Latency, nVidia, nVidia A100, OpenCL, Package, Performance, SYCL

August 21, 2022 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org