high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » COX: Exposing CUDA Warp-Level Functions to CPUs

COX: Exposing CUDA Warp-Level Functions to CPUs

Ruobing Han, Jaewon Lee, Jaewoong Sim, Hyesoon Kim

Georgia Institute of Technology, USA

ACM Transactions on Architecture and Code Optimization, 2022

DOI:10.1145/3554736

BibTeX

Download (PDF)

View

Source

1145

views

As CUDA becomes the de facto programming language among data parallel applications such as high-performance computing or machine learning applications, running CUDA on other platforms becomes a compelling option. Although several efforts have attempted to support CUDA on devices other than NVIDIA GPUs, due to extra steps in the translation, the support is always a few years behind CUDA’s latest features. In particular, the new CUDA programming model exposes the warp concept in the programming language, which greatly changes the way the CUDA code should be mapped to CPU programs. In this paper, hierarchical collapsing that correctly supports CUDA warp-level functions on CPUs is proposed. To verify hierarchical collapsing, we build a framework, COX, that supports executing CUDA source code on the CPU backend. With hierarchical collapsing, 90% of kernels in CUDA SDK samples can be executed on CPUs, much higher than previous works (68%). We also evaluate the performance with benchmarks for real applications, and show that hierarchical collapsing can generate CPU programs with comparable or even higher performance than previous projects in general.

Tags: Benchmarking, Compilers, Computer science, CUDA, HIP, nVidia

August 7, 2022 by hgpu

Rating: 4.5/5. From 2 votes.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

COX: Exposing CUDA Warp-Level Functions to CPUs

Your response

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)

COX: Exposing CUDA Warp-Level Functions to CPUs

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)