high performance computing on graphics processing units: hgpu.org

Posts

Oct, 4

Performance Portability Evaluation for OpenACC on Intel Knights Corner and Nvidia Kepler

OpenACC is a programming standard designed to simplify heterogeneous parallel programming by using directives. Since OpenACC can generate OpenCL and CUDA code, meanwhile running OpenCL on Intel Knight Corner is supported by CAPS HMPP compiler, it is attractive to using OpenACC on hardwares with different underlying microarchitectures. This paper studies how realistic it is to […]

Sep, 30

A GPU cluster optimized multigrid scheme for computing unsteady incompressible fluid flow

A multigrid scheme has been proposed that allows efficient implementation on modern CPUs, many integrated core devices (MICs), and graphics processing units (GPUs). It is shown that wide single instruction multiple data (SIMD) processing engines are used efficiently when a deep, 2h grid hierarchy is replaced with a two level scheme using 16h-32h restriction. The […]

OpenCL

Sep, 27

GPU-TLS: An Efficient Runtime for Speculative Loop Parallelization on GPUs

Recently GPUs have risen as one important parallel platform for general purpose applications, both in HPC and cloud environments. Due to the special execution model, developing programs for GPUs is difficult even with the recent introduction of high-level languages like CUDA and OpenCL. To ease the programming efforts, some research has proposed automatically generating parallel […]

CUDA

Sep, 20

gNek: A GPU Accelerated Incompressible Navier Stokes Solver

This thesis presents a GPU accelerated implementation of a high order splitting scheme with a spectral element discretization for the incompressible Navier Stokes (INS) equations. While others have implemented this scheme on clusters of processors using the Nek5000 code, to my knowledge this thesis is the first to explore its performance on the GPU. This […]

OpenCL

Sep, 18

Sparse Matrix Algorithms Using GPGPU

The purpose of this thesis was to benchmark and compare different representations of sparse matrices and algorithms for multiplying them with a vector. Also, to see the performance differences of running the algorithms on a CPU and GPU(s). Four different storage formats were tested – full matrix storage, coordinate storage (COO), ELLPACK (ELL), compressed sparse […]

OpenCL

Sep, 15

Algorithmic GPGPU Memory Optimization

The performance of General-Purpose computation on Graphics Processing Units (GPGPU) is heavily dependent on the memory access behavior. This sensitivity is due to a combination of the underlying Massively Parallel Processing (MPP) execution model present on GPUs and the lack of architectural support to handle irregular memory access patterns. Application performance can be significantly improved […]

OpenCL

Sep, 13

Simulation and modeling of physical broadcasts

The environment around us has many phenomena and has different behaviors according to different parameters, biological, chemical, physical, etc. To represent a simple and abstract reality of this environment we use a concept called environmental modeling. The environmental modeling deals with many environmental problems such as air pollution, diffusion of disease, animal behavior and so […]

CUDA

Sep, 13

Neptune: An astrophysical smooth particle hydrodynamics code for massively parallel computer architectures

Smooth particle hydrodynamics is an efficient method for modeling the dynamics of fluids. It is commonly used to simulate astrophysical processes such as binary mergers. We present a newly developed GPU accelerated smooth particle hydrodynamics code for astrophysical simulations. The code is named neptune after the Roman god of water. It is written in OpenMP […]

OpenCL

Sep, 11

Hardware-Oblivious Parallelism for In-Memory Column-Stores

The multi-core architectures of today’s computer systems make parallelism a necessity for performance critical applications. Writing such applications in a generic, hardware-oblivious manner is a challenging problem: Current database systems thus rely on labor-intensive and error-prone manual tuning to exploit the full potential of modern parallel hardware architectures like multi-core CPUs and graphics cards. We […]

OpenCL

Sep, 5

Transparent CPU-GPU Collaboration for Data-Parallel Kernels on Heterogeneous Systems

Heterogeneous computing on CPUs and GPUs has traditionally used fixed roles for each device: the GPU handles data parallel work by taking advantage of its massive number of cores while the CPU handles non data-parallel work, such as the sequential code or data transfer management. Unfortunately, this work distribution can be a poor solution as […]

OpenCL

Sep, 5

GPU & CPU implementation of Young – Van Vliet’s Recursive Gaussian Smoothing Filter

This document describes an implementation for GPU and CPU of Young and Van Vliet’s recursive Gaussian smoothing as an external module for the Insight Toolkit ITK, version 4.* www.itk.org. In the absence of an OpenCL-capable platform, the code will run the CPU implementation as an alternative to the existing Deriche recursive Gaussian smoothing filter in […]

CUDA

•

OpenCL

Aug, 26

Estimating the WCET of GPU-Accelerated Applications using Hybrid Analysis

The massive parallelism offered by Graphics Processing Units (GPUs) is now routinely exploited to accelerate computationally intensive tasks in a wide variety of application domains. Efficient GPU programming in languages such as CUDA and OpenCL requires careful application of hand optimisations to exploit parallelism and locality while minimising synchronisation. The effectiveness of such optimisations can […]

CUDA

•

OpenCL

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Performance Portability Evaluation for OpenACC on Intel Knights Corner and Nvidia Kepler

A GPU cluster optimized multigrid scheme for computing unsteady incompressible fluid flow

GPU-TLS: An Efficient Runtime for Speculative Loop Parallelization on GPUs

gNek: A GPU Accelerated Incompressible Navier Stokes Solver

Sparse Matrix Algorithms Using GPGPU

Algorithmic GPGPU Memory Optimization

Simulation and modeling of physical broadcasts

Neptune: An astrophysical smooth particle hydrodynamics code for massively parallel computer architectures

Hardware-Oblivious Parallelism for In-Memory Column-Stores

Transparent CPU-GPU Collaboration for Data-Parallel Kernels on Heterogeneous Systems

GPU & CPU implementation of Young – Van Vliet’s Recursive Gaussian Smoothing Filter

Estimating the WCET of GPU-Accelerated Applications using Hybrid Analysis

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)