MCUDA: An Efficient Implementation of CUDA Kernels on Multi-cores

hgpu.org » Applications » Computer science » MCUDA: An Efficient Implementation of CUDA Kernels on Multi-cores

MCUDA: An Efficient Implementation of CUDA Kernels on Multi-cores

John A. Stratton, Sam S. Stone, and Wen-mei W. Hwu

Center for Reliable and High-Performance Computing, University of Illinois at Urbana-Champaign

IMPACT Technical Report, IMPACT-08-01, University of Illinois at Urbana-Champaign, Center for Reliable and High-Performance Computing

BibTeX

Download (PDF)

View

Source

Source codes

Package:

MCUDA translation framework

2183

views

The CUDA programming model, which is based on an extended ANSI C language and a runtime environment, allows the programmer to specify explicitly data parallel computation. NVIDIA developed CUDA to open the architecture of their graphics accelerators to more general applications, but did not provide an efficient mapping to execute the programming model on any other architecture. This document describes Multicore-CUDA (MCUDA), a system that efficiently maps the CUDA programming model to a multicore CPU architecture. The major contribution of this work is the source-to-source translation process that converts CUDA code into standard C that interfaces to a runtime library for parallel execution. We apply the MCUDA framework to some CUDA applications previously shown to have high performance on a GPU, and demonstrate high efficiency executing these applications on a multicore CPU architecture. The thread-level parallelism, data locality and computational regularity of the code as expressed in the CUDA model achieve much of the benefit of hand-tuning an application for the CPU architecture. With the MCUDA framework, it is now possible to write data-parallel code in a single programming model for efficient execution on CPU or GPU architectures.

Tags: Compilers, Computer science, CUDA, High-level Languages, nVidia, Package

February 20, 2011 by hgpu

No votes yet.

Please wait...

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org