high performance computing on graphics processing units: hgpu.org

Posts

Sep, 5

A GPGPU Implementation of Approximate String Matching with Regular Expression Operators and Comparison with Its FPGA Implementation

In this paper, we propose an efficient GPGPU implementation of an algorithm for approximate string matching with regular expression operators, originally implemented on an FPGA, and compare the GPGPU, FPGA and CPU implementations by experiments. Approximate string matching with regular expression operators is used in various applications, such as full text database search and DNA […]

CUDA

Sep, 5

GPU-accelerated Fourier-continuation solvers and physically exact computational boundary conditions for wave scattering problems

Many important engineering problems, ranging from antenna design to seismic imaging, require the numerical solution of problems of time-domain propagation and scattering of acoustic, electromagnetic, elastic waves, etc. These problems present several key difficulties, including numerical dispersion, the need for computational boundary conditions, and the extensive computational cost that arises from the extremely large number […]

CUDA

Sep, 4

GPU implementation of a hybrid lattice Boltzmann method for non-isothermal flows

We propose a novel method to simulate non-isothermal flows. This method is ideally suited for the GPU architecture. The new algorithm is derived by coupling the lattice Boltzmann formulation for the flow with the finite difference scheme for the temperature field. We apply this algorithm to solve for the flow in the well known buoyancy […]

CUDA

Sep, 4

Automated Enhanced Parallelization of Sequential C to Parallel OpenMP

The paper presents the work towards implementation of a technique to enhance parallel execution of auto-generated OpenMP programs by considering the architecture of on-chip cache memory, thereby achieving higher performance. It avoids false-sharing in ‘for-loops’ by generating OpenMP code for dynamically scheduling chunks by placing each core’s data cache line size apart. It has been […]

Sep, 4

Accelerating distance matrix calculations utilizing GPU

When modeling pedestrian movement, it is necessary to find a path to the target point. It is possible to use a distance matrix or derived gradient map for this purpose. Calculations of distance matrix for large areas and multiple targets are very time-consuming. Therefore this article focuses on acceleration of these calculations utilizing Graphics Processing […]

OpenCL

Sep, 4

Computational Modelling of Galaxy Formation using FLAME GPU

As hardware has become increasingly powerful, the doors have been opened for a wide range of more computationally intensive simulation procedures. In particular, agent-based modelling has seen a recent surge of interest in the fields of Biology and Economics. For this project we propose using an agent-based model to create an implementation of the classic […]

CUDA

Sep, 4

Approximate Similarity Search for Online Multimedia Services on Distributed CPU-GPU Platforms

Similarity search in high-dimentional spaces is a pivotal operation found a variety of database applications. Recently, there has been an increase interest in similarity search for online content-based multimedia services. Those services, however, introduce new challenges with respect to the very large volumes of data that have to be indexed/searched, and the need to minimize […]

CUDA

Sep, 3

Energy Transfer Ray Tracing with OptiX

QUIC Energy is an energy modeling system for urban environments. Our research group has developed QUIC Energy as a part of a set of GPU-assisted tools with a common goal of increasing knowledge relating urban organization and design with environmental concerns. We hypothesize that it is possible to optimize urban organization, building placement, and material […]

CUDA

•

OpenGL

Sep, 3

Accelerated Flow Visualization of Advective-Diffusive Mixing Processes Using GPUs

In this article a strategy to accelerate the simulation and visualization of combined advective-diffusive mixing of a contaminant inside a square cavity with time-dependent boundary-conditions is presented. No moving walls are required to mix the fluid, but natural convection by periodic temperatures on opposite walls. A contaminant will diffuse asymptotically to uniform concentration. Advective mixing […]

CUDA

•

OpenGL

Sep, 3

Mixed-Resolution Patch-Matching

Matching patches of a source image with patches of itself or a target image is a first step for many operations. Finding the optimum nearest-neighbors of each patch using a global search of the image is expensive. Optimality is often sacrificed for speed as a result. We present the Mixed-Resolution Patch-Matching (MRPM) algorithm that uses […]

CUDA

Sep, 3

GPU-accelerated WZ Factorization with the Use of the CUBLAS Library

We present a novel implementation of a dense, square, non-structured matrix factorization algorithm, namely the WZ factorization – with the use of graphics processors (GPUs) and CPUs to gain a high performance at a low cost. We rewrite this factorization as operations on blocks of matrices and vectors. We have implemented our block-vector algorithm on […]

CUDA

Sep, 3

Solving Systems of Polynomial Equations on a GPU

This paper explores the opportunities of using a GPGPU to solve systems of polynomial equations. We propose numerical real root-finding based on recursive de Casteljau subdivision over an n-dimensional rectangular domain. Two variants of parallelism-multithreading and multiprocessing-have been investigated. The speed, memory consumption and resistance for different sets of input data have also been examined.

CUDA

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

DeepCompile: A Compiler-Driven Approach to Optimizing Distributed Deep Learning Training

Large Language Model Powered C-to-CUDA Code Translation: A Novel Auto-Parallelization Framework

GigaAPI: a user-space API that simplifies multi-GPU programming, bridging the gap between the capabilities of parallel GPU systems and the ability of developers to harness their full potential

GigaAPI for GPU Parallelization

high performance computing on graphics processing units: hgpu.org

Posts

A GPGPU Implementation of Approximate String Matching with Regular Expression Operators and Comparison with Its FPGA Implementation

GPU-accelerated Fourier-continuation solvers and physically exact computational boundary conditions for wave scattering problems

GPU implementation of a hybrid lattice Boltzmann method for non-isothermal flows

Automated Enhanced Parallelization of Sequential C to Parallel OpenMP

Accelerating distance matrix calculations utilizing GPU

Computational Modelling of Galaxy Formation using FLAME GPU

Approximate Similarity Search for Online Multimedia Services on Distributed CPU-GPU Platforms

Energy Transfer Ray Tracing with OptiX

Accelerated Flow Visualization of Advective-Diffusive Mixing Processes Using GPUs

Mixed-Resolution Patch-Matching

GPU-accelerated WZ Factorization with the Use of the CUBLAS Library

Solving Systems of Polynomial Equations on a GPU

Recent source codes

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Data-efficient LLM Fine-tuning for Code Generation

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Large Language Model Powered C-to-CUDA Code Translation: A Novel Auto-Parallelization Framework

GigaAPI: a user-space API that simplifies multi-GPU programming, bridging the gap between the capabilities of parallel GPU systems and the ability of developers to harness their full potential

Coccinelle: a C code transformation engine using SmPL for matches, refactorings, and bug fixing

DuoReduce: MLIR's benchmark

Shamrock: Multi-GPU hydrodynamics for astrophysics

LLMPerf: GPU Performance Modeling meets Large Language Models

Most viewed papers (last 30 days)