high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Dynamic loop vectorization for executing OpenCL kernels on CPUs

Dynamic loop vectorization for executing OpenCL kernels on CPUs

Izzat El Hajj

University of Illinois at Urbana-Champaign

University of Illinois at Urbana-Champaign, 2014

@article{el2014dynamic,

title={Dynamic loop vectorization for executing OpenCL kernels on CPUs},

author={El Hajj, Izzat},

year={2014}

}

Download (PDF)

View

Source

2137

views

Heterogeneous computing platforms are becoming increasingly important in supercomputing. Many systems now integrate CPUs and GPUs cooperating together on a single node. Much effort is invested in tuning GPU-kernels. However, it can be the case that some systems may not have GPUs or the GPUs are busy. Maintaining two versions of the same code for GPUs and CPUs is expensive. For this reason, it would be ideal if one could retarget GPU-optimized kernels to run efficiently on a CPU. Many efforts have been made to compile OpenCL kernels to run efficiently on CPUs. Such approaches typically involve running work-groups in parallel on different CPU threads, and executing work-items within a work-group in one thread serially via loop-based serialization or in parallel via SIMD vectorization. SIMD vectorization is particularly difficult where control divergence is present. This thesis proposes a technique for transforming divergent loops in OpenCL kernels such that vectorization opportunities can be extracted when possible and memory access patterns can be improved. The transformations presented show promising speedups for kernels that follow GPU programming best practices, and slowdowns for kernels that do not.

Tags: Computer science, Heterogeneous systems, OpenCL, Performance, Thesis

June 14, 2014 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Dynamic loop vectorization for executing OpenCL kernels on CPUs

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

Dynamic loop vectorization for executing OpenCL kernels on CPUs

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)