high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » High-Performance High-Order Stencil Computation on FPGAs Using OpenCL

High-Performance High-Order Stencil Computation on FPGAs Using OpenCL

Hamid Reza Zohouri, Artur Podobas, Satoshi Matsuoka

Tokyo Institute of Technology, Tokyo, Japan

arXiv:2002.05983 [cs.DC], (14 Feb 2020)

DOI:10.1109/IPDPSW.2018.00027

BibTeX

Download (PDF)

View

Source

1539

views

In this paper we evaluate the performance of FPGAs for high-order stencil computation using High-Level Synthesis. We show that despite the higher computation intensity and on-chip memory requirement of such stencils compared to first-order ones, our design technique with combined spatial and temporal blocking remains effective. This allows us to reach similar, or even higher, compute performance compared to first-order stencils. We use an OpenCL-based design that, apart from parameterizing performance knobs, also parameterizes the stencil radius. Furthermore, we show that our performance model exhibits the same accuracy as first-order stencils in predicting the performance of high-order ones. On an Intel Arria 10 GX 1150 device, for 2D and 3D star-shaped stencils, we achieve over 700 and 270 GFLOP/s of compute performance, respectively, up to a stencil radius of four. These results outperform the state-of-the-art YASK framework on a modern Xeon for 2D and 3D stencils, and outperform a modern Xeon Phi for 2D stencils, while achieving competitive performance in 3D. Furthermore, our FPGA design achieves better power efficiency in almost all cases.

Tags: Computer science, FPGA, OpenCL, Stencil computation

February 23, 2020 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

High-Performance High-Order Stencil Computation on FPGAs Using OpenCL

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

High-Performance High-Order Stencil Computation on FPGAs Using OpenCL

Share this:

Recent source codes

Most viewed papers (last 30 days)