high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs

MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs

John Stratton, Sam Stone, Wen-Mei Hwu

Center for Reliable and High-Performance Computing, University of Illinois at Urbana-Champaign

Languages and Compilers for Parallel Computing (2008), pp. 16-30

DOI:10.1007/978-3-540-89740-8_2

@article{stratton2008mcuda,

title={MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs},

author={Stratton, J. and Stone, S. and Hwu, W.},

journal={Languages and Compilers for Parallel Computing},

pages={16–30},

year={2008},

publisher={Springer}

}

Download (PDF)

View

Source

Source codes

Package:

MCUDA translation framework

1706

views

CUDA is a data parallel programming model that supports several key abstractions – thread blocks, hierarchical memory and barrier synchronization – for writing applications. This model has proven effective in programming GPUs. In this paper we describe a framework called MCUDA, which allows CUDA programs to be executed efficiently on shared memory, multi-core CPUs. Our framework consists of a set of source-level compiler transformations and a runtime system for parallel execution. Preserving program semantics, the compiler transforms threaded SPMD functions into explicit loops, performs fission to eliminate barrier synchronizations, and converts scalar references to thread-local data to replicated vector references. We describe an implementation of this framework and demonstrate performance approaching that achievable from manually parallelized and optimized C code. With these results, we argue that CUDA can be an effective data-parallel programming model for more than just GPU architectures.

Tags: Computer science, CUDA, High-level Languages, nVidia, Package

December 12, 2010 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs

Package:

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)