CuPBoP: CUDA for Parallelized and Broad-range Processors
Georgia Institute of Technology
arXiv:2206.07896 [cs.DC], (16 Jun 2022)
@misc{https://doi.org/10.48550/arxiv.2206.07896,
doi={10.48550/ARXIV.2206.07896},
url={https://arxiv.org/abs/2206.07896},
author={Han, Ruobing and Chen, Jun and Garg, Bhanu and Young, Jeffrey and Sim, Jaewoong and Kim, Hyesoon},
keywords={Distributed, Parallel, and Cluster Computing (cs.DC), Hardware Architecture (cs.AR), FOS: Computer and information sciences, FOS: Computer and information sciences},
title={CuPBoP: CUDA for Parallelized and Broad-range Processors},
publisher={arXiv},
year={2022},
copyright={arXiv.org perpetual, non-exclusive license}
}
CUDA is one of the most popular choices for GPU programming, but it can only be executed on NVIDIA GPUs. Executing CUDA on non-NVIDIA devices not only benefits the hardware community, but also allows data-parallel computation in heterogeneous systems. To make CUDA programs portable, some researchers have proposed using source-to-source translators to translate CUDA to portable programming languages that can be executed on non-NVIDIA devices. However, most CUDA translators require additional manual modifications on the translated code, which imposes a heavy workload on developers. In this paper, CuPBoP is proposed to execute CUDA on non-NVIDIA devices without relying on any portable programming languages. Compared with existing work that executes CUDA on non-NVIDIA devices, CuPBoP does not require manual modification of the CUDA source code, but it still achieves the highest coverage (69.6%), much higher than existing frameworks (56.6%) on the Rodinia benchmark. In particular, for CPU backends, CuPBoP supports several ISAs (e.g., X86, RISC-V, AArch64) and has close or even higher performance compared with other projects. We also compare and analyze the performance among CuPBoP, manually optimized OpenMP/MPI programs, and CUDA programs on the latest Ampere architecture GPU, and show future directions for supporting CUDA programs on non-NVIDIA devices with high performance
June 19, 2022 by hgpu