high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Cross-Platform Performance Portability Using Highly Parametrized SYCL Kernels

Cross-Platform Performance Portability Using Highly Parametrized SYCL Kernels

John Lawson, Mehdi Goli, Duncan McBain, Daniel Soutar, Louis Sugy

Codeplay Software Ltd.

arXiv:1904.05347 [cs.PF], (10 Apr 2019)

BibTeX

Download (PDF)

View

Source

Source codes

Package:

SYCL-BLAS: An implementation of BLAS using the SYCL open standard for acceleration on OpenCL devices

1856

views

Over recent years heterogeneous systems have become more prevalent across HPC systems, with over 100 supercomputers in the TOP500 incorporating GPUs or other accelerators. These hardware platforms have different performance characteristics and optimization requirements. In order to make the most of multiple accelerators a developer has to provide implementations of their algorithms tuned for each device. Hardware vendors provide libraries targeting their devices specifically, which provide good performance but frequently have different API designs, hampering portability. The SYCL programming model allows users to write heterogeneous programs using completely standard C++, and so developers have access to the power of C++ templates when developing compute kernels. In this paper we show that by writing highly parameterized kernels for matrix multiplies and convolutions we achieve performance competitive with vendor implementations across different architectures. Furthermore, tuning for new devices amounts to choosing the combinations of kernel parameters that perform best on the hardware.

Tags: AMD R9 Nano, ATI, BLAS, Computer science, Deep learning, Linear Algebra, Machine learning, Mathematical Software, OpenCL, Package, Performance, performance portability, SYCL

April 14, 2019 by hgpu

Rating: 3.0/5. From 2 votes.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Cross-Platform Performance Portability Using Highly Parametrized SYCL Kernels

Package:

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Cross-Platform Performance Portability Using Highly Parametrized SYCL Kernels

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)