high performance computing on graphics processing units: hgpu.org

Posts

Jun, 22

Comparison of Random Number Generators in Particle Swarm Optimization Algorithm

Intelligent optimization algorithms are very effective to tackle complex problems that would be difficult or impossible to solve exactly. A key component within these algorithms is the random number generators (RNGs) which provide random numbers to drive the stochastic search process. Much effort is devoted to develop efficient RNGs with good statistical properties, and many […]

CUDA

Jun, 20

Acceleration of GPU-based ultrasound simulation via data compression

The realistic simulation of ultrasound wave propagation is computationally intensive. The large size of the grid and low degree of reuse of data means that it places a great demand on memory bandwidth. Graphics Processing Units (GPUs) have attracted attention for performing scientific calculations due to their potential for efficiently performing large numbers of floating […]

CUDA

Jun, 20

GPU based FDTD method for investigation on the electromagnetic scattering from 1-D rough soil surface

In this paper, the graphic processor unit (GPU) implementation of the finite-difference time domain (FDTD) algorithm is presented to investigate the electromagnetic (EM) scattering from one dimensional (1-D) Gaussian rough soil surface. The FDTD lattices are truncated by uniaxial perfectly matched layer (UPML), in which the finite-difference equations are carried out for the total computation […]

CUDA

Jun, 20

A Fast Mixed-Band Lifting Wavelet Transform on the GPU

Discrete wavelet transform (DWT) has been widely used in many image compression applications, such as JPEG2000 and compressive sensing MRI. Even though a lifting scheme [1] has been widely adopted to accelerate DWT, only a handful of research has been done on its efficient implementation on many-core accelerators, such as graphics processing units (GPUs). Moreover, […]

CUDA

Jun, 20

MIC-SVM: Designing A Highly Efficient Support Vector Machine For Advanced Modern Multi-Core and Many-Core Architectures

Support Vector Machine (SVM) has been widely used in data-mining and Big Data applications as modern commercial databases start to attach an increasing importance to the analytic capabilities. In recent years, SVM was adapted to the field of High Performance Computing for power/performance prediction, auto-tuning, and runtime scheduling. However, even at the risk of losing […]

CUDA

Jun, 20

GPU Based Fast Free-Wake Calculations For Multiple Horizontal Axis Wind Turbine Rotors

Unsteady free-wake solutions of wind turbine flow fields involve computationally intensive interaction calculations, which generally limit the total amount of simulation time or the number of turbines that can be simulated by the method. This problem, however, can be addressed easily using high-level of parallelization. Especially when exploited with a GPU, a Graphics Processing Unit, […]

CUDA

Jun, 19

Accelerated SQLite Database using GPUs

This paper introduces the development of a new GPU-based database to accelerate data retrieval. The main goal is to explore new ways of handling complex data types and managing data and workloads in massively parallel databases. This paper presents three novel innovations to create an efficient virtual database engine that executes the majority of database […]

CUDA

Jun, 19

Parallel track reconstruction in CMS using the cellular automaton approach

The Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) is a general-purpose particle detector and comprises the largest silicon-based tracking system built to date with 75 million individual readout channels. The precise reconstruction of particle tracks from this tremendous amount of input channels is a compute-intensive task. The foreseen LHC beam parameters […]

OpenCL

Jun, 19

GPU/CPU Parallel Computation of Material Damage

In this paper CUDA (Compute Unified Device Architecture) programming and OpenMP (Open Multi-Processing) are used for the GPU (Graphics Processing Unit) and CPU (Central Processing Unit) parallel computation of material damage. The material damage is evaluated by a multilevel finite element analysis within material domains reconstructed from a high-resolution micro-focus X-ray computed tomography system. An […]

CUDA

Jun, 19

Fast Sequence Alignment Method Using CUDA-enabled GPU

Sequence alignment is a task that calculates the degree of similarity between two sequences. Given a query sequence, finding a database sequence which is most similar to the query by sequence alignment is the first step in bioinformatics research. The first sequence alignment algorithm was proposed by Needle-man and Wunsch. They got the optimal global […]

CUDA

Jun, 19

Medusa: A Parallel Graph Processing System on Graphics Processors

Medusa is a parallel graph processing system on graphics processors (GPUs). The core design of Medusa is to enable developers to leverage the massive parallelism and other hardware features of GPUs by writing sequential C/C++ code for a small set of APIs. This simplifies the implementation of parallel graph processing on the GPU. The runtime […]

CUDA

Jun, 18

Dealing With Big Data Outside Of The Cloud: GPU Accelerated Sort

The demands placed on systems to analyse corpus data increase with input size, and the traditional approaches to processing this data are increasingly having impractical run-times. We show that the use of desktop GPUs presents a significant opportunity to accelerate a number of stages in the normal corpus analysis pipeline. This paper contains our exploratory […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Comparison of Random Number Generators in Particle Swarm Optimization Algorithm

Acceleration of GPU-based ultrasound simulation via data compression

GPU based FDTD method for investigation on the electromagnetic scattering from 1-D rough soil surface

A Fast Mixed-Band Lifting Wavelet Transform on the GPU

MIC-SVM: Designing A Highly Efficient Support Vector Machine For Advanced Modern Multi-Core and Many-Core Architectures

GPU Based Fast Free-Wake Calculations For Multiple Horizontal Axis Wind Turbine Rotors

Accelerated SQLite Database using GPUs

Parallel track reconstruction in CMS using the cellular automaton approach

GPU/CPU Parallel Computation of Material Damage

Fast Sequence Alignment Method Using CUDA-enabled GPU

Medusa: A Parallel Graph Processing System on Graphics Processors

Dealing With Big Data Outside Of The Cloud: GPU Accelerated Sort

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)