MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs

John Stratton, Sam Stone, Wen-Mei Hwu
Center for Reliable and High-Performance Computing, University of Illinois at Urbana-Champaign
Languages and Compilers for Parallel Computing (2008), pp. 16-30


   title={MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs},

   author={Stratton, J. and Stone, S. and Hwu, W.},

   journal={Languages and Compilers for Parallel Computing},





Download Download (PDF)   View View   Source Source   Source codes Source codes



CUDA is a data parallel programming model that supports several key abstractions – thread blocks, hierarchical memory and barrier synchronization – for writing applications. This model has proven effective in programming GPUs. In this paper we describe a framework called MCUDA, which allows CUDA programs to be executed efficiently on shared memory, multi-core CPUs. Our framework consists of a set of source-level compiler transformations and a runtime system for parallel execution. Preserving program semantics, the compiler transforms threaded SPMD functions into explicit loops, performs fission to eliminate barrier synchronizations, and converts scalar references to thread-local data to replicated vector references. We describe an implementation of this framework and demonstrate performance approaching that achievable from manually parallelized and optimized C code. With these results, we argue that CUDA can be an effective data-parallel programming model for more than just GPU architectures.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: