high performance computing on graphics processing units: hgpu.org

Posts

Apr, 4

The optimization of parallel Smith-Waterman sequence alignment using on-chip memory of GPGPU

Memory optimization is an important strategy to gain high performance for sequence alignment implemented by CUDA on GPGPU. Smith-Waterman (SW) algorithm is the most sensitive algorithm widely used for local sequence alignment but very time consuming. Although several parallel methods have been used in some studies and shown good performances, advantages of GPGPU memory hierarchy […]

CUDA

Apr, 4

Parallel connected-component labeling algorithm for GPGPU applications

This paper proposes a new connected component labeling algorithm for GPGPU applications based on NVIDIA’s CUDA. Various approaches and algorithms for connected component labeling with minimal execution time were designed, but the most of them have been focused on optimizing CPU algorithm. Therefore it is hard to apply these approaches to GPGPU programming models such […]

CUDA

Apr, 4

Exploring GPGPU workloads: Characterization methodology, analysis and microarchitecture evaluation implications

The GPUs are emerging as a general-purpose high-performance computing device. Growing GPGPU research has made numerous GPGPU workloads available. However, a systematic approach to characterize these benchmarks and analyze their implication on GPU microarchitecture design evaluation is still lacking. In this research, we propose a set of microarchitecture agnostic GPGPU workload characteristics to represent them […]

CUDA

Apr, 4

Parallel Exact Inference on a CPU-GPGPU Heterogenous System

Exact inference is a key problem in exploring probabilistic graphical models. The computational complexity of inference increases dramatically with the parameters of the graphical model. To achieve scalability over hundreds of threads remains a fundamental challenge. In this paper, we use a lightweight scheduler hosted by the CPU to allocate cliques in junction trees to […]

CUDA

Apr, 4

A Static Task Partitioning Approach for Heterogeneous Systems Using OpenCL

Heterogeneous multi-core platforms are increasingly prevalent due to their perceived superior performance over homogeneous systems. The best performance, however, can only be achieved if tasks are accurately mapped to the right processors. OpenCL programs can be partitioned to take advantage of all the available processors in a system. However, finding the best partitioning for any […]

OpenCL

Apr, 3

GPU Accelerated Solver of Time-Dependent Air Pollutant Transport Equations

Main objective of this paper is to outline possible ways how to achieve a substantial acceleration in case of advection-diffusion equation (A-DE) calculation, which is commonly used for a description of the pollutant behavior in atmosphere. A-DE is a land of partial differential equation (PDE) and in general case it is usually solved by numerical […]

CUDA

Apr, 3

GPU-Accelerated Method of Moments by Example: Monostatic Scattering

In this paper, we combine and extend two of our previous works to provide a more complete solution for the GPU acceleration of the Method of Moments, using CUDA by NVIDIA. To this end, the formulations of the original 1982 Rao-Wilton-Glisson paper are revisited, and the scattering analysis of a square PEC plate is considered […]

CUDA

Apr, 3

An Accelerated IHS Transform Fusion of Remote Sensing Image Data Based on GPU

In this paper we designed a remote sensing image data fusion algorithm on GPU (Graphics Processing Unit) using the programmability of GPU which is a parallel vector processor. Both of the forward IHS and inverse IHS transform computation were mapped into GPU. We realized parallel rendering and output of the three components of the IHS, […]

Apr, 3

A New GPU-Based Neighbor Search Algorithm for Fluid Simulations

Fluid simulations based on Smoothed Particle Hydrodynamics (SPH) have been widely used for generating complex motion of fluid. However,implementation of searching particle neighbors on graphics processing unit (GPU) can not be satisfied till now. In this paper, we present a new grid-based neighbor search method on GPU for GPU-based SPH fluid simulation. Using this new […]

Apr, 3

Efficient Parallel Algorithm for Nonlinear Dimensionality Reduction on GPU

Advances in nonlinear dimensionality reduction provide a way to understand and visualize the underlying structure of complex data sets. The performance of large-scale nonlinear dimensionality reduction is of key importance in data mining, machine learning, and data analysis. In this paper, we concentrate on improving the performance of nonlinear dimensionality reduction using large-scale data sets […]

Apr, 3

Accelerate Smoothed Particle Hydrodynamics using GPU

Physic-based fluid simulation is used extensively nowadays; however the traditional serial algorithm can’t satisfy the real-time requirement due to its complexity and computeintensive. The development of modern GPU makes this possible. In this paper, a Smoothed Particle Hydrodynamics (SPH) method for incompressible fluid was implemented using CUDA on GPU. Since the algorithm was executed on […]

CUDA

Apr, 3

GPU acceleration of MOLAR for HRRT List-Mode OSEM reconstructions

The Siemens ECAT HRRT PET scanner has the potential to produce images of the human brain with spatial resolution better than 3 mm. MOLAR (a motion-compensation OSEM List-mode Algorithm for Resolution-recovery) was developed to provide reconstructions of HRRT data with the best possible accuracy and precision. However, a computer cluster is required to generate reconstructions […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

The optimization of parallel Smith-Waterman sequence alignment using on-chip memory of GPGPU

Parallel connected-component labeling algorithm for GPGPU applications

Exploring GPGPU workloads: Characterization methodology, analysis and microarchitecture evaluation implications

Parallel Exact Inference on a CPU-GPGPU Heterogenous System

A Static Task Partitioning Approach for Heterogeneous Systems Using OpenCL

GPU Accelerated Solver of Time-Dependent Air Pollutant Transport Equations

GPU-Accelerated Method of Moments by Example: Monostatic Scattering

An Accelerated IHS Transform Fusion of Remote Sensing Image Data Based on GPU

A New GPU-Based Neighbor Search Algorithm for Fluid Simulations

Efficient Parallel Algorithm for Nonlinear Dimensionality Reduction on GPU

Accelerate Smoothed Particle Hydrodynamics using GPU

GPU acceleration of MOLAR for HRRT List-Mode OSEM reconstructions

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)