high performance computing on graphics processing units: hgpu.org

Posts

May, 6

Efficient nearest-neighbor computation for GPU-based motion planning

We present a novel k-nearest neighbor search algorithm (KNNS) for proximity computation in motion planning algorithm that exploits the computational capabilities of many-core GPUs. Our approach uses locality sensitive hashing and cuckoo hashing to construct an efficient KNNS algorithm that has linear space and time complexity and exploits the multiple cores and data parallelism effectively. […]

CUDA

May, 6

Robust Adaptive 3-D Segmentation of Vessel Laminae From Fluorescence Confocal Microscope Images and Parallel GPU Implementation

This paper presents robust 3-D algorithms to segment vasculature that is imaged by labeling laminae, rather than the lumenal volume. The signal is weak, sparse, noisy, nonuniform, low-contrast, and exhibits gaps and spectral artifacts, so adaptive thresholding and Hessian filtering based methods are not effective. The structure deviates from a tubular geometry, so tracing algorithms […]

CUDA

May, 6

Mechanical Characterization and Performance Optimization for GPU Fan-Sink Cooling Module Assembly

Three GPU fan-sink cooling module assembly mounting mechanisms are mechanically characterized to determine the relationships between the clamping forces and screw torques. The first-order screw torque solutions are determined from the statistical regressions according to current industry recommendations. The screw tension force theoretical solution is derived for application to the finite-element model to assess the […]

May, 6

Large-scale multi-dimensional document clustering on GPU clusters

Document clustering plays an important role in data mining systems. Recently, a flocking-based document clustering algorithm has been proposed to solve the problem through simulation resembling the flocking behavior of birds in nature. This method is superior to other clustering algorithms, including k-means, in the sense that the outcome is not sensitive to the initial […]

CUDA

May, 5

Towards accelerating molecular modeling via multi-scale approximation on a GPU

Research efforts to analyze biomolecular properties contribute towards our understanding of biomolecular function. Calculating non-bonded forces (or in our case, electrostatic surface potential) is often a large portion of the computational complexity in analyzing biomolecular properties. Therefore, reducing the computational complexity of these force calculations, either by improving the computational algorithm or by improving the […]

May, 5

Towards real time vision based UUV navigation using GPU technology

The last decade has witnessed the establishment of image processing as a viable means of aiding underwater navigation. However, many such systems are only implemented in pre-processing and offline due to their excessive computational demands. Real-time techniques often require special purpose hardware or impose limitations on the system to obtain real-time performance at the expense […]

CUDA

May, 5

The implementation of Multi-Scale Retinex image enhancement algorithm based on GPU via CUDA

The MSR (Multi-Scale Retinex) image enhancement algorithm can produce best performance in most cases, but the computation load is very huge especially for large image. In this paper, an efficient approach is proposed to accelerate MSR image enhancement speed on GPU via CUDA (Compute Unified Device Architecture). Time consuming modules such as multi-scale Gaussian filter, […]

CUDA

May, 5

K-nearest neighbor search: Fast GPU-based implementations and application to high-dimensional feature matching

The k-nearest neighbor (kNN) search problem is widely used in domains and applications such as classification, statistics, and biology. In this paper, we propose two fast GPU-based implementations of the brute-force kNN search algorithm using the CUDA and CUBLAS APIs. We show that our CUDA and CUBLAS implementations are up to, respectively, 64X and 189X […]

CUDA

May, 5

Optimization and parameter exploration using GPU based FDTD solvers

Graphical processing units (GPU) has been documented for the implementation of the FDTD technique. The use of these specialized processors for the implementation of numerical codes has been shown to significantly speed up the execution of these codes over standard CPU based solvers. With the execution of the FDTD method being reduced to a matter […]

May, 5

Programming Challenges for the Implementation of Numerical Quadrature in Atomic Physics on FPGA and GPU Accelerators

Although the need for heterogeneous chips in high performance numerical computing was identified by Chillemi and co-authors in 2001 it is only over the past five years that it has emerged as the new frontier for HPC. In this environment one or more accelerators works symbiotically, on each node, with a multi-core CPU. Two such […]

May, 5

A GPU-based architecture for improved online rebinning performance in clinical 3-D PET

Online rebinning is an important and well-established technique for reducing the time required to process PET data. However, the need for efficient data processing in a clinical setting is growing rapidly and is beginning to exceed the capability of traditional online processing methods. High-count rate applications such as rubidium 3-D PET studies can easily saturate […]

May, 5

GPU acceleration for statistical gene classification

The use of Bioinformatic tools in routine clinical diagnostics is still facing a number of issues. The more complex and advanced bioinformatic tools become, the more performance is required by the computing platforms. Unfortunately, the cost of parallel computing platforms is usually prohibitive for both public and small private medical practices. This paper presents a […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Efficient nearest-neighbor computation for GPU-based motion planning

Robust Adaptive 3-D Segmentation of Vessel Laminae From Fluorescence Confocal Microscope Images and Parallel GPU Implementation

Mechanical Characterization and Performance Optimization for GPU Fan-Sink Cooling Module Assembly

Large-scale multi-dimensional document clustering on GPU clusters

Towards accelerating molecular modeling via multi-scale approximation on a GPU

Towards real time vision based UUV navigation using GPU technology

The implementation of Multi-Scale Retinex image enhancement algorithm based on GPU via CUDA

K-nearest neighbor search: Fast GPU-based implementations and application to high-dimensional feature matching

Optimization and parameter exploration using GPU based FDTD solvers

Programming Challenges for the Implementation of Numerical Quadrature in Atomic Physics on FPGA and GPU Accelerators

A GPU-based architecture for improved online rebinning performance in clinical 3-D PET

GPU acceleration for statistical gene classification

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)