high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs

Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs

John A. Stratton, Vinod Grover, Jaydeep Marathe, Bastiaan Aarts, Mike Murphy, Ziang Hu, Wen-mei W. Hwu

NVIDIA Corporation / University of Illinois at Urbana-Champaign, Champaign, IL, USA

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization, CGO ’10, 2010

DOI:10.1145/1772954.1772971

@inproceedings{stratton2010efficient,

title={Efficient compilation of fine-grained spmd-threaded programs for multicore cpus},

author={Stratton, J.A. and Grover, V. and Marathe, J. and Aarts, B. and Murphy, M. and Hu, Z. and Hwu, W.W.},

booktitle={Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization},

pages={111–119},

year={2010},

organization={ACM}

}

Download (PDF)

View

Source

2290

views

In this paper we describe techniques for compiling fine-grained SPMD-threaded programs, expressed in programming models such as OpenCL or CUDA, to multicore execution platforms. Programs developed for manycore processors typically express finer thread-level parallelism than is appropriate for multicore platforms. We describe options for implementing fine-grained threading in software, and find that reasonable restrictions on the synchronization model enable significant optimizations and performance improvements over a baseline approach. We evaluate these techniques in a production-level compiler and runtime for the CUDA programming model targeting modern CPUs. Applications tested with our tool often showed performance parity with the compiled C version of the application for single-thread performance. With modest coarse-grained multithreading typical of today’s CPU architectures, an average of 3.4x speedup on 4 processors was observed across the test applications.

Tags: Algorithms, Computer science, CUDA, nVidia, OpenCL, Programming techniques

February 20, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)