high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » DL: A data layout transformation system for heterogeneous computing

DL: A data layout transformation system for heterogeneous computing

I-Jui Sung, Geng Daniel Liu, Wen-Mei W. Hwu

ECE department of the University of Illinois at Urbana-Champaign

IEEE Innovative parallel computing: Foundations & Applications of GPU, Manycore, and Heterogeneous Systems (INPAR), 2012

@inproceedings{sung2012dl,

title={DL: A data layout transformation system for heterogeneous computing},

author={Sung, I.J. and Liu, G.D. and Hwu, W.M.W.},

booktitle={Innovative Parallel Computing (InPar), 2012},

pages={1–11},

year={2012},

organization={IEEE}

}

Download (PDF)

View

Source

Source codes

Package:

libmarshal

2325

views

For many-core architectures like the GPUs, efficient off-chip memory access is crucial to high performance; the applications are often limited by off-chip memory bandwidth. Transforming data layout is an effective way to reshape the access patterns to improve off-chip memory access behavior, but several challenges had limited the use of automated data layout transformation systems on GPUs, namely how to efficiently handle arrays of aggregates, and transparently marshal data between layouts required by different performance sensitive kernels and legacy host code. While GPUs have higher memory bandwidth and are natural candidates for marshaling data between layouts, the relatively constrained GPU memory capacity, compared to that of the CPU, implies that not only the temporal cost of marshaling but also the spatial overhead must be considered for any practical layout transformation systems. This paper presents DL, a practical GPU data layout transformation system that addresses these problems: first, a novel approach to laying out array of aggregate types across GPU and CPU architectures is proposed to further improve memory parallelism and kernel performance beyond what is achieved by human programmers using discrete arrays today. Our proposed new layout can be derived in situ from the traditional Array of Structure, Structure of Arrays, and adjacent Discrete Arrays layouts used by programmers. Second, DL has a run-time library implemented in OpenCL that transparently and efficiently converts, or marshals, data to accommodate application components that have different data layout requirements. We present insights that lead to the design of this highly efficient run-time marshaling library. In particular, the in situ transformation implemented in the library is comparable or faster than optimized traditional out-of-place transformations while avoiding doubling the GPU DRAM usage. Third, we show experimental results that the new layout approach leads to substantial performance improvement at the applications level even when all marshaling cost is taken into account.

Tags: Computer science, Heterogeneous systems, nVidia, nVidia GeForce GTX 480, OpenCL, Package

November 1, 2012 by hgpu

Rating: 1.5/5. From 1 vote.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

DL: A data layout transformation system for heterogeneous computing

Package:

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

DL: A data layout transformation system for heterogeneous computing

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)