high performance computing on graphics processing units: hgpu.org

Posts

Sep, 6

Percolation study of samples on 2D lattices using GPUs

We study the percolation problem of sites on 2D lattices of various geometries, using general purpose graphic processing units (GPGPU). The implementation of a component labeling parallel algorithm in CUDA and their generalization to different geometries, is discussed. The results of performance for this algorithm on a GPU versus the corresponding sequential implementation of reference […]

CUDA

Sep, 5

Efficient Implementation of RLS-Based Adaptive Filters on nVIDIA GeForce Graphics Processing Unit

This paper presents efficient implementation of RLS-based adaptive filters with a large number of taps on nVIDIA GeForce graphics processing unit (GPU) and CUDA software development environment. Modification of the order and the combination of calculations reduces the number of accesses to slow off-chip memory. Assigning tasks into multiple threads also takes memory access order […]

CUDA

Sep, 5

Real-Time Motion Artifact Compensation for PMD-ToF Images

Time-of-Flight (ToF) cameras gained a lot of scientific attention and became a vivid field of research in the last years. A still remaining problem of ToF cameras are motion artifacts in dynamic scenes. This paper presents a new preprocessing method for a fast motion artifact compensation. We introduce a ow like algorithm that supports motion […]

CUDA

Sep, 5

Work in Progress: Vortex Detection and Visualization for Design of Micro Air Vehicles and Turbomachinery

Vortex detection and visualization is an important technique for computational fluid dynamics (CFD) modelers and analysts. Since vortices are often not just local phenomena, algorithms for detecting the vortex core can be expanded by the use of streamline placement and termination methodologies to appropriately visualize the vortex. We are enhancing an existing VCDetect software tool […]

CUDA

Sep, 5

Transparent CPU-GPU Collaboration for Data-Parallel Kernels on Heterogeneous Systems

Heterogeneous computing on CPUs and GPUs has traditionally used fixed roles for each device: the GPU handles data parallel work by taking advantage of its massive number of cores while the CPU handles non data-parallel work, such as the sequential code or data transfer management. Unfortunately, this work distribution can be a poor solution as […]

OpenCL

Sep, 5

GPU & CPU implementation of Young – Van Vliet’s Recursive Gaussian Smoothing Filter

This document describes an implementation for GPU and CPU of Young and Van Vliet’s recursive Gaussian smoothing as an external module for the Insight Toolkit ITK, version 4.* www.itk.org. In the absence of an OpenCL-capable platform, the code will run the CPU implementation as an alternative to the existing Deriche recursive Gaussian smoothing filter in […]

CUDA

•

OpenCL

Sep, 4

Generation of the Scrambled Halton Sequence Using Accelerators

The Halton sequence is one of the most popular low-discrepancy sequences. In order to satisfy some practical requirements, the original sequence is usually modified in some way. The scrambling algorithm, proposed by Owen, has several theoretical advantages, but on the other hand is difficult to implement in practice due to the trade-off between high memory […]

CUDA

Sep, 4

The discrete dipole approximation code DDscat.C++: features, limitations and plans

We present a new freely available open-source C++ software for numerical solution of the electromagnetic waves absorption and scattering problems within the Discrete Dipole Approximation paradigm. The code is based upon the famous and free Fortan-90 code DDSCAT by B. Draine and P. Flatau. Started as a teaching project, the presented code DDscat.C++ differs from […]

CUDA

Sep, 4

Detecting multiple periodicities in observational data with the multi-frequency periodogram. II. Frequency Decomposer, a parallelized time-series analysis algorithm

This is a parallelized algorithm performing a decomposition of a noisy time series into a number of frequency components. The algorithm analyses all suspicious periodicities that can be revealed, including the ones that look like an alias or noise at a glance, but later may prove to be a real variation. After selection of the […]

CUDA

Sep, 4

Optimizing the MapReduce Framework on Intel Xeon Phi Coprocessor

With the ease-of-programming, flexibility and yet efficiency, MapReduce has become one of the most popular frameworks for building big-data applications. MapReduce was originally designed for distributed-computing, and has been extended to various architectures, e,g, multi-core CPUs, GPUs and FPGAs. In this work, we focus on optimizing the MapReduce framework on Xeon Phi, which is the […]

Sep, 4

Accelerating a Cloud-Based Software GNSS Receiver

In this paper we discuss ways to reduce the execution time of a software Global Navigation Satellite System (GNSS) receiver that is meant for offline operation in a cloud environment. Client devices record satellite signals they receive, and send them to the cloud, to be processed by this software. The goal of this project is […]

CUDA

Sep, 2

Accurate and Efficient Filtering using Anistropic Filter Decomposition

Efficient filtering remains an important challenge in computer graphics, particularly when filters are spatially-varying, have large extent, and/or exhibit complex anisotropic profiles. We present an efficient filtering approach for these difficult cases based on anisotropic filter decomposition (IFD). By decomposing complex filters into linear combinations of simpler, displaced isotropic kernels, and precomputing a compact prefiltered […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Percolation study of samples on 2D lattices using GPUs

Efficient Implementation of RLS-Based Adaptive Filters on nVIDIA GeForce Graphics Processing Unit

Real-Time Motion Artifact Compensation for PMD-ToF Images

Work in Progress: Vortex Detection and Visualization for Design of Micro Air Vehicles and Turbomachinery

Transparent CPU-GPU Collaboration for Data-Parallel Kernels on Heterogeneous Systems

GPU & CPU implementation of Young – Van Vliet’s Recursive Gaussian Smoothing Filter

Generation of the Scrambled Halton Sequence Using Accelerators

The discrete dipole approximation code DDscat.C++: features, limitations and plans

Detecting multiple periodicities in observational data with the multi-frequency periodogram. II. Frequency Decomposer, a parallelized time-series analysis algorithm

Optimizing the MapReduce Framework on Intel Xeon Phi Coprocessor

Accelerating a Cloud-Based Software GNSS Receiver

Accurate and Efficient Filtering using Anistropic Filter Decomposition

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)