high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Performance portability study of epistasis detection using SYCL on NVIDIA GPU

Performance portability study of epistasis detection using SYCL on NVIDIA GPU

Zheming Jin, Jeffrey S. Vetter

Oak Ridge National Laboratory

Oak Ridge National Laboratory, 2022

DOI:10.1145/3535508.3545591

BibTeX

Download (PDF)

View

Source

964

views

We describe the experience of converting a CUDA implementation of a high-order epistasis detection algorithm to SYCL. The goals are for our work to be useful to application and compiler developers with a detailed description of migration paths between CUDA and SYCL. Evaluating the CUDA and SYCL applications on an NVIDIA V100 GPU, we find that the optimization of loop unrolling needs to be applied manually to the SYCL kernel for obtaining comparable performance. The performance of the SYCL group reduce function, an alternative to the CUDA warp-based reduction, depends on the problem and work group sizes. The 64-bit popcount operation implemented with tree of adders is slightly faster than the built-in popcount operation. When the number of OpenMP threads is four, the highest performance of the SYCL and CUDA applications are comparable.

Tags: Computer science, CUDA, Genomics, nVidia, performance portability, SYCL, Tesla V100

October 9, 2022 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Performance portability study of epistasis detection using SYCL on NVIDIA GPU

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Performance portability study of epistasis detection using SYCL on NVIDIA GPU

Share this:

Recent source codes

Most viewed papers (last 30 days)