high performance computing on graphics processing units: hgpu.org

Posts

Nov, 14

Efficient Implementation of the Simplex Method on a CPU-GPU System

The Simplex algorithm is a well known method to solve linear programming (LP) problems. In this paper, we propose a parallel implementation of the Simplex on a CPU-GPU systems via CUDA. Double precision implementation is used in order to improve the quality of solutions. Computational tests have been carried out on randomly generated instances for […]

CUDA

Nov, 14

A fast and robust seed flooding algorithm on GPU for Voronoi diagram generation

Voronoi diagram(VD) is a fundamental data structure in computational geometry. With the rapid development of programmable graphics programmable units, utilizing GPU to construct VD has been an optimal strategy. Considering the bridles of state-of-art algorithms, a seed flooding algorithm(SFA) is presented to achieve both robustness and high performance. The experimental results shows that SFA can […]

CUDA

Nov, 14

B-CALM: An open-source GPU-based 3D-FDTD with multi-pole dispersion for plasmonics

Numerical calculations with finite-difference time-domain (FDTD) on metallic nanostructures in a broad optical spectrum require an accurate approximation of the permittivity of dispersive materials. Here, we present the algorithms behind B-CALM (Belgium-California Light Machine), an open-source 3D-FDTD solver operating on Graphical Processing Units (GPU’s) with multi-pole dispersion models. Our modified architecture shows a reduction in […]

Nov, 14

GPU Based Tissue Doppler Imaging

Tissue Doppler imaging is a routinely used diagnostic tool for assessing myocardial function in real time. The required signal processing is computationally intensive, including modified auto-correlation, scan conversion, image mapping. Parallel algorithms and implementations based on GPU platform are proposed in this paper to increase the computation efficiency. The experimental signal data is acquired from […]

CUDA

Nov, 14

Extinction-Based Shading and Illumination in GPU Volume Ray-Casting

Direct volume rendering has become a popular method for visualizing volumetric datasets. Even though computers are continually getting faster, it remains a challenge to incorporate sophisticated illumination models into direct volume rendering while maintaining interactive frame rates. In this paper, we present a novel approach for advanced illumination in direct volume rendering based on GPU […]

CUDA

Nov, 13

Comprehensive Performance Monitoring for GPU Cluster Systems

Accelerating applications with GPUs has recently garnered a lot of interest from the scientific computing community. While tools for optimizing individual kernels are readily available, there is a lack of support for the specific needs of the HPC area. Most importantly, integration with existing parallel programming models (MPI and threading) and scalability to the full […]

CUDA

Nov, 13

Advantages and GPU implementation of high-performance indexed DNA search based on suffix arrays

A comparative analysis of high-performance implementations of two state of the art index structures that are of particular interest in the field of bioinformatics applications to accelerate the alignment of DNA sequences is presented. The two indexes are based on suffix trees and suffix arrays and were implemented in two different platforms: a quad-core CPU […]

CUDA

Nov, 13

Fast Level Set Segmentation of Biomedical Images using Graphics Processing Units

Image segmentation is the task of splitting a digital image into one or more regions of interest. It is a fundamental problem in computer vision and many different methods, each with their own advantages and disadvantages, exist for the task. Image segmentation is a particularly difficult task for several reasons. Firstly, the ambiguous nature of […]

CUDA

Nov, 13

Efficient GPU Implementation for Particle in Cell Algorithm

Particle in cell (PIC) algorithm is a widely used method in plasma physics to study the trajectories of charged particles under electromagnetic fields. The PIC algorithm is computationally intensive and its time requirements are proportional to the number of charged particles involved in the simulation. The focus of the paper is to parallelize the PIC […]

CUDA

Nov, 13

Molecular Docking on FPGA and GPU Platforms

Molecular docking is an important problem of bioinformatics aiming at the prediction of binding poses of molecules. Auto Dock is a popular, open-source docking software applying a computationally expensive but parallelizable algorithm. This paper introduces an FPGA-based and a GPU-based implementation of Auto Dock and shows how the original algorithm can be effectively accelerated on […]

CUDA

Nov, 12

Efficiently Computing Tensor Eigenvalues on a GPU

The tensor eigenproblem has many important applications, generating both mathematical and application-specific interest in the properties of tensor eigenpairs and methods for computing them. A tensor is an m-way array, generalizing the concept of a matrix (a 2-way array). Kolda and Mayo have recently introduced a generalization of the matrix power method for computing real-valued […]

CUDA

Nov, 12

rCUDA: Reducing the number of GPU-based accelerators in high performance clusters

The increasing computing requirements for GPUs (Graphics Processing Units) have favoured the design and marketing of commodity devices that nowadays can also be used to accelerate general purpose computing. Therefore, future high performance clusters intended for HPC (High Performance Computing) will likely include such devices. However, high-end GPU-based accelerators used in HPC feature a considerable […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Efficient Implementation of the Simplex Method on a CPU-GPU System

A fast and robust seed flooding algorithm on GPU for Voronoi diagram generation

B-CALM: An open-source GPU-based 3D-FDTD with multi-pole dispersion for plasmonics

GPU Based Tissue Doppler Imaging

Extinction-Based Shading and Illumination in GPU Volume Ray-Casting

Comprehensive Performance Monitoring for GPU Cluster Systems

Advantages and GPU implementation of high-performance indexed DNA search based on suffix arrays

Fast Level Set Segmentation of Biomedical Images using Graphics Processing Units

Efficient GPU Implementation for Particle in Cell Algorithm

Molecular Docking on FPGA and GPU Platforms

Efficiently Computing Tensor Eigenvalues on a GPU

rCUDA: Reducing the number of GPU-based accelerators in high performance clusters

Recent source codes

Allo: Accelerator Design Language

Towards Robust Agentic CUDA Kernel Benchmarking, Verification, and Optimization

HPC Benchmark Survey

HDM: Home made Diffusion Models

General Matrix Multiplication (GEMM)

CrossTL: Universal Programming Language & Translator

TBD-GPU

DG-SWEM - The Discontinuous Galerkin Shallow Water Equation Model

torchPDLP: Primal-Dual Linear Programming in PyTorch. In collaboration with AMD and IPAM

Benchmarks for Dissecting CPU-GPU Unified Physical Memory on AMD MI300A APUs

Most viewed papers (last 30 days)