high performance computing on graphics processing units: hgpu.org

Posts

Sep, 1

Parallel GPU-accelerated Recursion-based Generators of Pseudorandom Numbers

The aim of the paper is to show how to design fast parallel algorithms for linear congruential and lagged Fibonacci pseudorandom numbers generators. The new algorithms employ the divide-and-conquer approach for solving linear recurrence systems and can be easily implemented on GPU-accelerated hybrid systems using CUDA or OpenCL. Numerical experiments performed on a computer system […]

CUDA

•

OpenCL

Sep, 1

A GPU Support for Large Scale Quantum Chemistry Applications

GPU/GPGPU computing has been used widely in scientific simulation to improve the performance on hybrid architectures. The quantum chemistry field has benefited greatly from using GPUs, including tasks such as visualization of molecular orbitals and computation of electronic structures. To gain significant success in using GPUs, a large amount of code rewriting and restructuring is […]

CUDA

Sep, 1

GAROP: Genetic Algorithm framework for Running On Parallel environments

In this research, a Genetic Algorithms framework for Running On Parallel environments, which is named GAROP, is proposed. The GAROP provides the library for a parallel processing, so that users should only describe codes for genetic algorithms (GA) programs, utilizing the library implemented for the part requiring a parallel processing. In the GAROP framework, GA […]

CUDA

Sep, 1

Binomial American Option Pricing on CPU-GPU Hetergenous System

We present a novel parallel binomial algorithm to compute prices of American options. The algorithm partitions a binomial tree into blocks of multiple levels of nodes, and assigns each such block to multiple processors. Each processor in parallel with the others computes the option’s values at nodes assigned to it. The computation consists of two […]

CUDA

Sep, 1

A Map-Reduce-Like System for Programming and Optimizing Data-Intensive Computations on Emerging Parallel Architectures

Parallel computing environments are ubiquitous nowadays, including traditional CPU clusters and the emergence of GPU clusters and CPU-GPU clusters because of their performance, cost and energy efficiency. With this trend, an important research issue is to effectively utilize the massive computing power in these architectures to accelerate data-intensive applications arising from commercial and scientific domains. […]

CUDA

Aug, 31

Multi-GPU Implementation of the Uniformization Method for Solving Markov Models

Markovian models can generate very large sparse matrices, which are difficult to store and solve. A useful method for finding transient probabilities in Markovian models is the uniformization. The aim of this paper is to show that the performance of the uniformization can be improved using multiGPU architecture. We propose partitioning scheme for HYB sparse […]

CUDA

Aug, 31

CUDA-Accelerated Data-Mining for Putative Heteromeric Transcription Factors and Target Genes Using Microarray Gene Expression Profiles

Understanding protein-protein and protein-DNA interactions is key to understanding the dynamics of gene regulation [3,17]. We here review a previously presented method[1,15,20], based on a variation of microarray expression profile correlation analysis, that seeks to identify interactions between a putative heteropolymeric transcription factor(TF) complex and DNA as well as some experimental results that bolster the […]

CUDA

Aug, 31

SWM: Simplified Wu-Manber for GPU-based Deep Packet Inspection

Graphics processing units (GPU) have potential to speed up deep packet inspection (DPI) by processing many packets in parallel. However, popular methods of DPI such as deterministic finite automata are limited because they are single stride. Alternatively, the complexity of multiple stride methods is not appropriate for the SIMD operation of a GPU. In this […]

OpenCL

Aug, 31

Image Object Tracking System Using Parallel Mean Shift Algorithm

We implement a real-time image object tracking system with PTZ cameras. In general, mean shift algorithm is efficient for real-time tracking because of its fast and stable performance. However, in the image tracking system for PTZ cameras, the speed is not satisfied. So in this paper, we use parallel mean shift algorithm based on the […]

CUDA

Aug, 31

GPU Acceleration of Many Independent Mid-Sized Simulations on Graphs

Many GPU parallelizations exist to speedup simulation of complex systems, but these approaches see less benefit when the simulation is not large. Simulation of many independent complex systems is useful for Monte Carlo sampling or for exploring the behavior of many different models at once. We present and evaluate an algorithm for simulating many mid-sized […]

OpenCL

Aug, 30

A File System Using GPU-Accelerated File-wise Reliability Scheme

This work revises the original file-wise reliability scheme to cope with larger pages in storage devices nowadays, and implements it as a file system prototype: CRSFS. There are four layers in CRSFS: GPU primitive for Cauchy Reed-Solomon (CRS) coding, CrystalGPU framework, CRS coding layer and AFS FUSE layer. CRSFS provides GPU acceleration on the CRS […]

CUDA

Aug, 30

The multi-GPU System with ExpEther

Clusters using multiple GPUs have been already widespread to build a high performance computer economically. However, since the number of plugged GPUs into a CPU is limited, such clusters are consisting of multiple host PCs each of which has a few GPUs. This conventional multi-GPU cluster requires programmers to learn parallel programming skills for controlling […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Parallel GPU-accelerated Recursion-based Generators of Pseudorandom Numbers

A GPU Support for Large Scale Quantum Chemistry Applications

GAROP: Genetic Algorithm framework for Running On Parallel environments

Binomial American Option Pricing on CPU-GPU Hetergenous System

A Map-Reduce-Like System for Programming and Optimizing Data-Intensive Computations on Emerging Parallel Architectures

Multi-GPU Implementation of the Uniformization Method for Solving Markov Models

CUDA-Accelerated Data-Mining for Putative Heteromeric Transcription Factors and Target Genes Using Microarray Gene Expression Profiles

SWM: Simplified Wu-Manber for GPU-based Deep Packet Inspection

Image Object Tracking System Using Parallel Mean Shift Algorithm

GPU Acceleration of Many Independent Mid-Sized Simulations on Graphs

A File System Using GPU-Accelerated File-wise Reliability Scheme

The multi-GPU System with ExpEther

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)