high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Automatic efficient data layout for multithreaded stencil codes on CPUs and GPUs

Automatic efficient data layout for multithreaded stencil codes on CPUs and GPUs

Julien Jaeger, Denis Barthou

Parallelisme, Reseaux, Systemes d’information, Modelisation (PRISM), CNRS : UMR8144 – Universite de Versailles Saint-Quentin-en-Yvelines

High Performance Computing conference (2012), hal-00793201, (28 February 2013)

@inproceedings{jaeger2012automatic,

title={Automatic efficient data layout for multithreaded stencil codes on CPUs and GPUs},

author={Jaeger, Julien and Barthou, Denis and others},

booktitle={IEEE Proceedings of High Performance Computing conference},

pages={1–10},

year={2012}

}

Download (PDF)

View

Source

2264

views

Stencil based computation on structured grids is a kernel at the heart of a large number of scientific applications. The variety of stencil kernels used in practice make this computation pattern difficult to assemble into a high performance computing library. With the multiplication of cores on a single chip, answering architectural alignment requirements became an even more important key to high performance. In addition to vector accesses, data layout optimization must also consider concurrent parallel accesses. In this paper, we develop a strategy to automatically generate stencil codes for multicore vector architectures, searching for the best data layout possible to answer architectural alignment problems. We introduce a new method for aligning multidimensional data structures, called multipadding, that can be adapted to specificities of multicores and GPUs architectures. We present multiple methods with different level of complexity. We show on different stencil patterns that generated codes with multipadding display better performances than existing optimizations.

Tags: Code generation, Computer science, CUDA, nVidia, nVidia Quadro FX 5800

March 12, 2013 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Automatic efficient data layout for multithreaded stencil codes on CPUs and GPUs

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

Automatic efficient data layout for multithreaded stencil codes on CPUs and GPUs

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)