high performance computing on graphics processing units: hgpu.org

Posts

Sep, 6

A Survey on GPU System Considering its Performance on Different Applications

In this paper we study NVIDIA graphics processing unit (GPU) along with its computational power and applications. Although these units are specially designed for graphics application we can employee there computation power for non graphics application too. GPU has high parallel processing power, low cost of computation and less time utilization; it gives good result […]

CUDA

Sep, 6

Phase Aware Memory Scheduling

Computer architecture is at the brink of convergence with the integration of the general-purpose multi-core CPU architecture and the special purpose accelerated graphics architecture (GPU). Semiconductor giants like Intel and AMD have already brought to the market next-generation integrated heterogeneous processors in the form of the Sandy Bridge and the Fusion architecture respectively. However, with […]

Sep, 6

Skew Handling in Aggregate Streaming Queries on GPUs

Nowadays, the data to be processed by database systems has grown so large that any conventional, centralized technique is inadequate. At the same time, general purpose computation on GPU (GPGPU) recently has successfully drawn attention from the data management community due to its ability to achieve significant speed-ups at a small cost. Efficient skew handling […]

CUDA

Sep, 6

Percolation study of samples on 2D lattices using GPUs

We study the percolation problem of sites on 2D lattices of various geometries, using general purpose graphic processing units (GPGPU). The implementation of a component labeling parallel algorithm in CUDA and their generalization to different geometries, is discussed. The results of performance for this algorithm on a GPU versus the corresponding sequential implementation of reference […]

CUDA

Sep, 5

Efficient Implementation of RLS-Based Adaptive Filters on nVIDIA GeForce Graphics Processing Unit

This paper presents efficient implementation of RLS-based adaptive filters with a large number of taps on nVIDIA GeForce graphics processing unit (GPU) and CUDA software development environment. Modification of the order and the combination of calculations reduces the number of accesses to slow off-chip memory. Assigning tasks into multiple threads also takes memory access order […]

CUDA

Sep, 5

Real-Time Motion Artifact Compensation for PMD-ToF Images

Time-of-Flight (ToF) cameras gained a lot of scientific attention and became a vivid field of research in the last years. A still remaining problem of ToF cameras are motion artifacts in dynamic scenes. This paper presents a new preprocessing method for a fast motion artifact compensation. We introduce a ow like algorithm that supports motion […]

CUDA

Sep, 5

Work in Progress: Vortex Detection and Visualization for Design of Micro Air Vehicles and Turbomachinery

Vortex detection and visualization is an important technique for computational fluid dynamics (CFD) modelers and analysts. Since vortices are often not just local phenomena, algorithms for detecting the vortex core can be expanded by the use of streamline placement and termination methodologies to appropriately visualize the vortex. We are enhancing an existing VCDetect software tool […]

CUDA

Sep, 5

Transparent CPU-GPU Collaboration for Data-Parallel Kernels on Heterogeneous Systems

Heterogeneous computing on CPUs and GPUs has traditionally used fixed roles for each device: the GPU handles data parallel work by taking advantage of its massive number of cores while the CPU handles non data-parallel work, such as the sequential code or data transfer management. Unfortunately, this work distribution can be a poor solution as […]

OpenCL

Sep, 5

GPU & CPU implementation of Young – Van Vliet’s Recursive Gaussian Smoothing Filter

This document describes an implementation for GPU and CPU of Young and Van Vliet’s recursive Gaussian smoothing as an external module for the Insight Toolkit ITK, version 4.* www.itk.org. In the absence of an OpenCL-capable platform, the code will run the CPU implementation as an alternative to the existing Deriche recursive Gaussian smoothing filter in […]

CUDA

•

OpenCL

Sep, 4

Generation of the Scrambled Halton Sequence Using Accelerators

The Halton sequence is one of the most popular low-discrepancy sequences. In order to satisfy some practical requirements, the original sequence is usually modified in some way. The scrambling algorithm, proposed by Owen, has several theoretical advantages, but on the other hand is difficult to implement in practice due to the trade-off between high memory […]

CUDA

Sep, 4

The discrete dipole approximation code DDscat.C++: features, limitations and plans

We present a new freely available open-source C++ software for numerical solution of the electromagnetic waves absorption and scattering problems within the Discrete Dipole Approximation paradigm. The code is based upon the famous and free Fortan-90 code DDSCAT by B. Draine and P. Flatau. Started as a teaching project, the presented code DDscat.C++ differs from […]

CUDA

Sep, 4

Detecting multiple periodicities in observational data with the multi-frequency periodogram. II. Frequency Decomposer, a parallelized time-series analysis algorithm

This is a parallelized algorithm performing a decomposition of a noisy time series into a number of frequency components. The algorithm analyses all suspicious periodicities that can be revealed, including the ones that look like an alias or noise at a glance, but later may prove to be a real variation. After selection of the […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

A Survey on GPU System Considering its Performance on Different Applications

Phase Aware Memory Scheduling

Skew Handling in Aggregate Streaming Queries on GPUs

Percolation study of samples on 2D lattices using GPUs

Efficient Implementation of RLS-Based Adaptive Filters on nVIDIA GeForce Graphics Processing Unit

Real-Time Motion Artifact Compensation for PMD-ToF Images

Work in Progress: Vortex Detection and Visualization for Design of Micro Air Vehicles and Turbomachinery

Transparent CPU-GPU Collaboration for Data-Parallel Kernels on Heterogeneous Systems

GPU & CPU implementation of Young – Van Vliet’s Recursive Gaussian Smoothing Filter

Generation of the Scrambled Halton Sequence Using Accelerators

The discrete dipole approximation code DDscat.C++: features, limitations and plans

Detecting multiple periodicities in observational data with the multi-frequency periodogram. II. Frequency Decomposer, a parallelized time-series analysis algorithm

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)