high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » SIMD Re-Convergence At Thread Frontiers

SIMD Re-Convergence At Thread Frontiers

Gregory Diamos, Andrew Kerr, Haicheng Wu, Sudhakar Yalamanchili, Benjamin Ashbaugh, Subramaniam Maiyuran

Georgia Institute of Technology

The 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 44), 2011

@techreport{diamos2011simd,

title={SIMD Re-Convergence At Thread Frontiers},

author={Diamos, G. and Ashbaugh, B. and Maiyuran, S. and Wu, H. and Kerr, A. and Yalamanchili, S.},

year={2011},

institution={Technical report}

}

Download (PDF)

View

Source

1358

views

Hardware and compiler techniques for mapping data-parallel programs with divergent control flow to SIMD architectures have recently enabled the emergence of new GPGPU programming models such as CUDA, OpenCL, and DirectX Compute. The impact of branch divergence can be quite different depending upon whether the program’s control flow is structured or unstructured. In this paper, we show that unstructured control ow occurs frequently in applications and can lead to significant code expansion when executed using existing approaches for handling branch divergence. This paper proposes a new technique for automatically mapping arbitrary control flow onto SIMD processors that relies on a concept of a Thread Frontier, which is a bounded region of the program containing all threads that have branched away from the current warp. This technique is evaluated on a GPU emulator configured to model i) a commodity GPU (Intel Sandybridge), and ii) custom hardware support not realized in current GPU architectures. It is shown that this new technique performs identically to the best existing method for structured control flow, and re-converges at the earliest possible point when executing unstructured control flow. This leads to i) between 1:5 633:2% reductions in dynamic instruction counts for several real applications, ii) simplification of the compilation process, and iii) ability to efficiently add high level unstructured programming constructs (e.g., exceptions) to existing data-parallel languages.

Tags: Computer science, CUDA, nVidia, Performance, Programming Languages, PTX

October 31, 2011 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

SIMD Re-Convergence At Thread Frontiers

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

SIMD Re-Convergence At Thread Frontiers

Share this:

Recent source codes

Most viewed papers (last 30 days)