high performance computing on graphics processing units: hgpu.org

Posts

Sep, 26

GPU Shape Grammars

GPU Shape Grammars provide a solution for interactive procedural generation, tuning and visualization of massive environment elements for both video games and production rendering. Our technique generates detailed models without explicit geometry storage. To this end we reformulate the grammar expansion for generation of detailed models at the tesselation control and geometry shader stages. Using […]

Sep, 26

Enabling Development of OpenCL Applications on FPGA platforms

FPGAs can potentially deliver tremendous acceleration in high-performance server and embedded computing applications. Whether used to augment a processor or as a stand-alone device, these reconfigurable architectures are being deployed in a large number of implementations owing to the massive amounts of parallelism offered. At the same time, a significant challenge encountered in their wide-spread […]

OpenCL

Sep, 26

A Parallel Auxiliary Grid AMG Method for GPU

In this paper, we develop a new parallel auxiliary grid algebraic multigrid (AMG) method to leverage the power of graphic processing units (GPUs). In the construction of the hierarchical coarse grid, we use a simple and fixed coarsening procedure based on a region quadtree generated from an auxiliary grid. This allows us to explicitly control […]

CUDA

Sep, 26

Accelerating Iterative SpMV for Discrete Logarithm Problem using GPUs

In the cryptanalytic context, computing discrete logarithms in large cyclic groups using index-calculus-based methods, such as the number field sieve or the function field sieve, requires solving large sparse systems of linear equations modulo the group order. Most of the fast algorithms used to solve such systems — e.g., the conjugate gradient or the Lanczos […]

CUDA

Sep, 25

GPF: a framework for general packet classification on GPU co-processors

This thesis explores the design and experimental implementation of GPF, a novel protocol-independent, multi-match packet classification framework. This framework is targeted and optimised for flexible, efficient execution on NVIDIA GPU platforms through the CUDA API, but should not be difficult to port to other platforms, such as OpenCL, in the future. GPF was conceived and […]

CUDA

Sep, 25

Implementation and Analysis of AES Encryption on GPU

GPU is continuing its trend of vastly outperforming CPU while becoming more general purpose. In order to improve the efficiency of AES algorithm, this paper proposed a CUDA implementation of Electronic Codebook (ECB) mode encoding process and Cipher Feedback (CBC) mode decoding process on GPU. In our implementation, the frequently accessed T-boxes were allocated on […]

CUDA

Sep, 25

Speeding up Scoring Module of Mass Spectrometry Based Protein Identification by GPU

Database searching is a main method for protein identification in shotgun proteomics, and many research efforts are dedicated to improving its effectiveness. However, the efficiency of database searching is facing a serious challenge, due to the ever fast growth of protein and peptide databases resulted from genome translations, enzymatic digestions, and post-translational modifications (PTMs). On […]

CUDA

Sep, 25

GPU Accelerated Lambert Solution Methods for the Orbital Targeting Problem

Lamberts problem is concerned with the determination of an orbit that connects two position vectors within a specified time of flight. It must often be solved millions of times, especially when one is conducting global searches for possible gravity assist missions, which requires fast efficient solutions. The orbital targeting problem lends itself well to parallel […]

CUDA

Sep, 25

GPU in Physics Computation: Case Geant4 Navigation

General purpose computing on graphic processing units (GPU) is a potential method of speeding up scientific computation with low cost and high energy efficiency. We experimented with the particle physics simulation toolkit Geant4 used at CERN to benchmark its geometry navigation functionality on a GPU. The goal was to find out whether Geant4 physics simulations […]

CUDA

Sep, 24

Mobile Computational Photography, IS&T/SPIE Electronic Imaging 2013, EI 2013

Conference EI20D This conference is intended to bring together world class researchers and practitioners that develop and deploy imaging technologies to enable novel solutions for mobile photography. Submissions are accepted on theory, application, and experience. The scope of the conference includes: Computation * computational image enhancement (e.g., noise reduction, super resolution, image stabilization) * computational […]

Sep, 24

Sound Speed Optimization Using Image Texture on CUDA

The Compute Unified Device Architecture (CUDA) is a brand new parallel processing platform making use of the unified shader design of the most current Graphics Processing Units (GPUs) from NVIDIA. In this paper, we apply this revolutionary new technology to implement the sound speed optimization (SSO) with image texture analysis for medical ultrasound imaging. The […]

CUDA

Sep, 24

Resolution of the Vlasov-Maxwell system by PIC Discontinuous Galerkin method on GPU with OpenCL

We present an implementation of a Vlasov-Maxwell solver for multicore processors. The Vlasov equation describes the evolution of charged particles in an electromagnetic field, solution of the Maxwell equations. The Vlasov equation is solved by a Particle-In-Cell method (PIC), while the Maxwell system is computed by a Discontinuous Galerkin method. We use the OpenCL framework, […]

OpenCL

* * *

high performance computing on graphics processing units: hgpu.org

Posts

GPU Shape Grammars

Enabling Development of OpenCL Applications on FPGA platforms

A Parallel Auxiliary Grid AMG Method for GPU

Accelerating Iterative SpMV for Discrete Logarithm Problem using GPUs

GPF: a framework for general packet classification on GPU co-processors

Implementation and Analysis of AES Encryption on GPU

Speeding up Scoring Module of Mass Spectrometry Based Protein Identification by GPU

GPU Accelerated Lambert Solution Methods for the Orbital Targeting Problem

GPU in Physics Computation: Case Geant4 Navigation

Mobile Computational Photography, IS&T/SPIE Electronic Imaging 2013, EI 2013

Sound Speed Optimization Using Image Texture on CUDA

Resolution of the Vlasov-Maxwell system by PIC Discontinuous Galerkin method on GPU with OpenCL

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)