high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » A Performance Model and Optimization Strategies for Automatic GPU Code Generation of PDE Systems Described by a Domain-Specific Language

A Performance Model and Optimization Strategies for Automatic GPU Code Generation of PDE Systems Described by a Domain-Specific Language

Yue Hu

Electrical & Computer Engineering, Louisiana State University

Louisiana State University, 2016

BibTeX

Download (PDF)

View

Source

Source codes

Package:

Chemora

2172

views

Stencil computations are a class of algorithms operating on multi-dimensional arrays also called grid functions (GFs), which update array elements using their nearest-neighbors. This type of computation forms the basis for computer simulations across almost every field of science, such as computational fluid dynamics. Its mostly regular data access patterns potentially enable it to take advantage of GPU’s high computation and data bandwidth. However, manual GPU programming is time-consuming and error-prone, as well as requiring an in-depth knowledge of GPU architecture and programming. To overcome the difficulties in manual programming, a number of stencil frameworks have been developed to automatically generate GPU codes from user-written stencil code, usually in a Domain Specific Language. The previous stencil frameworks demonstrate the feasibility, but they also introduce a set of unprecedented challenges in real stencil applications. This dissertation is based on the Chemora stencil framework, aiming to better deal with real stencil applications, especially with large stencil calculations. The large calculations usually consist of dozens of GFs with a variety of stencil patterns, resulting in extremely large code-generation ways. First, we propose an algorithm to map a calculation into one or more kernels by minimizing off-chip memory accesses while maintaining a relatively high thread-level parallelism. Second, we propose an efficiency-based buffering algorithm which operates by scoring a change in buffering strategy for a GF using a performance estimation and resource usage. Let b (i.e., 5) denote the number of buffering strategies the framework supports. With the algorithm, a near optimal solution can be found in (b-1)N(N+1)/2 steps, instead of b^N steps, for a calculation with N GFs. Third, we wrote a set of microbenchmarks to explore and measure some performance-critical GPU microarchitecture features and parameters for better performance modeling. Finally, we propose an analytic performance model to predict the execution time.

Tags: Code generation, Computer science, CUDA, DSL, Fluid dynamics, Intel Xeon Phi, nVidia, OpenCL, Package, PDEs, PTX, Stencil computation, Tesla K20, Thesis

August 31, 2016 by hgpu

Rating: 0.5/5. From 1 vote.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

A Performance Model and Optimization Strategies for Automatic GPU Code Generation of PDE Systems Described by a Domain-Specific Language

Package:

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

A Performance Model and Optimization Strategies for Automatic GPU Code Generation of PDE Systems Described by a Domain-Specific Language

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)