high performance computing on graphics processing units: hgpu.org

Posts

Aug, 15

First experiences with the Intel MIC architecture at LRZ

With the rapidly growing demand for computing power new accelerator based architectures have entered the world of high performance computing since around 5 years. In particular GPGPUs have recently become very popular, however programming GPGPUs using programming languages like CUDA or OpenCL is cumbersome and error-prone. Trying to overcome these difficulties, Intel developed their own […]

Aug, 14

GPU Acceleration of a Basket Option Pricing Engine

One of the most important methods for pricing complex derivatives is Monte Carlo simulation. However, this method requires a large amount of computing resources for accurate estimates. Since Monte Carlo simulations used in derivatives pricing are often parallelisable, one way to reduce the computing time is to use GPUs, which allow many copies of the […]

CUDA

•

OpenCL

Aug, 10

Analyzing Optimization Techniques for Power Efficiency on Heterogeneous Platforms

Graphics processing units (GPUs) have become widely accepted as the computing platform of choice in many high performance computing domains. The availability of programming standards such as OpenCL are used to leverage the inherent parallelism offered by GPUs. Source code optimizations such as loop unrolling and tiling when targeted to heterogeneous applications have reported large […]

OpenCL

Aug, 9

A GPU implementation of massively parallel direction splitting for the incompressible Navier-Stokes equations

Guermond and Minev proposed a directional splitting algorithm to solve the incompressible Stokes equations. Their algorithm applies the alternating direction implicit method to the viscosity term. The pressure update uses a direction splitting method in order to enforce the incompressibility constraint, as opposed to commonly used projection methods that require the solution of a Poisson […]

OpenCL

•

OpenGL

Aug, 9

A GPGPU based program to solve the TDSE in intense laser fields through the finite difference approach

We present a General-purpose computing on graphics processing units (GPGPU) based computational program and framework for the electronic dynamics of atomic systems under intense laser fields. We present our results using the case of hydrogen, however the code is trivially extensible to tackle problems within the single-active electron (SAE) approximation. Building on our previous work, […]

OpenCL

Aug, 7

Automatic Skeleton-Based Compilation through Integration with an Algorithm Classification

This paper presents a technique to fully automatically generate efficient and readable code for parallel processors. We base our approach on skeleton-based compilation and "algorithmic species", an algorithm classification of program code. We use a tool to automatically annotate C code with species information where possible. The annotated program code is subsequently fed into the […]

CUDA

•

OpenCL

Aug, 6

Portable Parallel Kernels for High-Speed Beamforming in Synthetic Aperture Ultrasound Imaging

In medical ultrasound, synthetic aperture (SA) imaging is well-considered as a novel image formation technique for achieving superior resolution than that offered by existing scanners. However, its intensive processing load is known to be a challenging factor. To address such a computational demand, this paper proposes a new parallel approach based on the design of […]

OpenCL

Aug, 1

A note on the GPU acceleration of eigenvalue computations

Eigenvalue computations for large sparse matrices such as the Lanczos method are commonly based on Krylov subspace techniques. One of the dominant operations in such algorithms are iterated computations of inner products with the same vector in order to preserve orthogonality of the Krylov basis. These operations can be accelerated by existing BLAS functionality using […]

CUDA

•

OpenCL

Aug, 1

Matrix Convolution using Parallel Programming

The convolution theorem is used to multiply matrices of two different sizes i.e. matrices in which the number of rows in the first matrix is not equal to the number of columns in the second matrix. In this study, the multiplication of 3*3 and 4*4 matrices was done using MPI. A 3*3 matrix was taken […]

OpenCL

Jul, 12

A GPGPU-based Pipeline for Accelerated Rendering of Point Clouds

Direct rendering of large point clouds has become common practice in architecture and archaeology in recent years. Due to the high point density no meshes are reconstructed from the scanning data, but the points can be rendered directly as primitives of a graphics API like OpenGL. However, these APIs and the hardware, which they are […]

OpenCL

•

OpenGL

Jul, 12

SIMD Divergence Optimization through Intra-Warp Compaction

SIMD execution units in GPUs are increasingly used for high performance and energy efficient acceleration of general purpose applications. However, SIMD control flow divergence effects can result in reduced execution efficiency in a class of GPGPU applications, classified as divergent applications. Improving SIMD efficiency, therefore, has the potential to bring significant performance and energy benefits […]

OpenCL

•

OpenGL

Jul, 7

CrowdCL: Web-Based Volunteer Computing with WebCL

We present CrowdCL, an open-source framework for the rapid development of volunteer computing and OpenCL applications on the web. Drawing inspiration from existing GPU libraries like PyCUDA, CrowdCL provides an abstraction layer for WebCL aimed at reducing boilerplate and improving code readability. CrowdCL also provides developers with a framework to easily run computations in the […]

OpenCL

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Posts

First experiences with the Intel MIC architecture at LRZ

GPU Acceleration of a Basket Option Pricing Engine

Analyzing Optimization Techniques for Power Efficiency on Heterogeneous Platforms

A GPU implementation of massively parallel direction splitting for the incompressible Navier-Stokes equations

A GPGPU based program to solve the TDSE in intense laser fields through the finite difference approach

Automatic Skeleton-Based Compilation through Integration with an Algorithm Classification

Portable Parallel Kernels for High-Speed Beamforming in Synthetic Aperture Ultrasound Imaging

A note on the GPU acceleration of eigenvalue computations

Matrix Convolution using Parallel Programming

A GPGPU-based Pipeline for Accelerated Rendering of Point Clouds

SIMD Divergence Optimization through Intra-Warp Compaction

CrowdCL: Web-Based Volunteer Computing with WebCL

Recent source codes

Code examples for paper on SYCL backend of Kokkos - IWOCL 2024

ROCm's implementation of Gromacs

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Most viewed papers (last 30 days)