high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » CUDA 2D Stencil Computations for the Jacobi Method

CUDA 2D Stencil Computations for the Jacobi Method

Jose Maria Cecilia, Jose Manuel Garcia, Manuel Ujaldon

Computer Engineering and Technology Department, University of Murcia, Spain

Applied Parallel and Scientific Computing, Lecture Notes in Computer Science, Volume 7133/2012, 173-183, 2012

DOI:10.1007/978-3-642-28151-8_17

BibTeX

Download (PDF)

View

Source

2924

views

We are witnessing the consolidation of the GPUs streaming paradigm in parallel computing. This paper explores stencil operations in CUDA to optimize on GPUs the Jacobi method for solving Laplace’s differential equation. The code keeps constant the access pattern through a large number of loop iterations, that way being representative of a wide set of iterative linear algebra algorithms. Optimizations are focused on data parallelism, threads deployment and the GPU memory hierarchy, whose management is explicit by the CUDA programmer. Experimental results are shown on Nvidia Teslas C870 and C1060 GPUs and compared to a counterpart version optimized on a quadcore Intel CPU. The speed-up factor for our set of GPU optimizations reaches 3-4x and the execution times defeat those of the CPU by a wide margin, also showing great scalability when moving towards a more sophisticated GPU architecture and/or more demanding problem sizes.

Tags: Algorithms, Computer science, CUDA, Data parallelism, Differential equations, Linear Algebra, nVidia, Optimization, Tesla C1060, Tesla C870

March 16, 2012 by hgpu

No votes yet.

Please wait...

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

CUDA 2D Stencil Computations for the Jacobi Method

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)

CUDA 2D Stencil Computations for the Jacobi Method

Share this:

Recent source codes

Most viewed papers (last 30 days)