high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Providing performance portable numerics for Intel GPUs

Providing performance portable numerics for Intel GPUs

Yu-Hsiang M. Tsai, Terry Cojean, Hartwig Anzt

Steinbuch Centre for Computing, Karlsruhe, Institute of Technology, Karlsruhe, Baden-Württemberg, Germany

Concurrency and Computation: Practice and Experience published by John Wiley & Sons Ltd, e7400, 2022

DOI:10.1002/cpe.7400

BibTeX

Download (PDF)

View

Source

Source codes

Package:

Ginkgo: a high-performance linear algebra library for manycore systems

1083

views

With discrete Intel GPUs entering the high-performance computing landscape, there is an urgent need for production-ready software stacks for these platforms. In this article, we report how we enable the Ginkgo math library to execute on Intel GPUs by developing a kernel backed based on the DPC++ programming environment. We discuss conceptual differences between the CUDA and DPC++ programming models and describe workflows for simplified code conversion. We evaluate the performance of basic and advanced sparse linear algebra routines available in Ginkgo’s DPC++ backend in the hardware-specific performance bounds and compare against routines providing the same functionality that ship with Intel’s oneMKL vendor library.

Tags: AMD Radeon Instinct MI100, ATI, Computer science, CUDA, Linear Algebra, nVidia, nVidia A100, OpenCL, Package, performance portability, Sparse, SYCL

October 30, 2022 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Providing performance portable numerics for Intel GPUs

Package:

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Providing performance portable numerics for Intel GPUs

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)