high performance computing on graphics processing units: hgpu.org

Posts

Sep, 25

Implementation and Analysis of AES Encryption on GPU

GPU is continuing its trend of vastly outperforming CPU while becoming more general purpose. In order to improve the efficiency of AES algorithm, this paper proposed a CUDA implementation of Electronic Codebook (ECB) mode encoding process and Cipher Feedback (CBC) mode decoding process on GPU. In our implementation, the frequently accessed T-boxes were allocated on […]

CUDA

Sep, 25

Speeding up Scoring Module of Mass Spectrometry Based Protein Identification by GPU

Database searching is a main method for protein identification in shotgun proteomics, and many research efforts are dedicated to improving its effectiveness. However, the efficiency of database searching is facing a serious challenge, due to the ever fast growth of protein and peptide databases resulted from genome translations, enzymatic digestions, and post-translational modifications (PTMs). On […]

CUDA

Sep, 25

GPU Accelerated Lambert Solution Methods for the Orbital Targeting Problem

Lamberts problem is concerned with the determination of an orbit that connects two position vectors within a specified time of flight. It must often be solved millions of times, especially when one is conducting global searches for possible gravity assist missions, which requires fast efficient solutions. The orbital targeting problem lends itself well to parallel […]

CUDA

Sep, 25

GPU in Physics Computation: Case Geant4 Navigation

General purpose computing on graphic processing units (GPU) is a potential method of speeding up scientific computation with low cost and high energy efficiency. We experimented with the particle physics simulation toolkit Geant4 used at CERN to benchmark its geometry navigation functionality on a GPU. The goal was to find out whether Geant4 physics simulations […]

CUDA

Sep, 24

Mobile Computational Photography, IS&T/SPIE Electronic Imaging 2013, EI 2013

Conference EI20D This conference is intended to bring together world class researchers and practitioners that develop and deploy imaging technologies to enable novel solutions for mobile photography. Submissions are accepted on theory, application, and experience. The scope of the conference includes: Computation * computational image enhancement (e.g., noise reduction, super resolution, image stabilization) * computational […]

Sep, 24

Sound Speed Optimization Using Image Texture on CUDA

The Compute Unified Device Architecture (CUDA) is a brand new parallel processing platform making use of the unified shader design of the most current Graphics Processing Units (GPUs) from NVIDIA. In this paper, we apply this revolutionary new technology to implement the sound speed optimization (SSO) with image texture analysis for medical ultrasound imaging. The […]

CUDA

Sep, 24

Resolution of the Vlasov-Maxwell system by PIC Discontinuous Galerkin method on GPU with OpenCL

We present an implementation of a Vlasov-Maxwell solver for multicore processors. The Vlasov equation describes the evolution of charged particles in an electromagnetic field, solution of the Maxwell equations. The Vlasov equation is solved by a Particle-In-Cell method (PIC), while the Maxwell system is computed by a Discontinuous Galerkin method. We use the OpenCL framework, […]

OpenCL

Sep, 24

A Hardware-Accelerated Parallel Implementation of a Two-Dimensional Scheme for Free Surface Flows

This contribution concerns the verification and performance assessment of a hardware-accelerated parallel implementation of an algorithm for the semi-implicit finite difference method for solving the vertically integrated shallow water equations including a non-linear treatment of wetting and drying and conservative advection schemes. Instead of adapting an existing serial, OpenMP-, or MPI-parallelised code with all necessary […]

CUDA

Sep, 24

ACO on Multiple GPUs with CUDA for Faster Solution of QAPs

In this paper, we implement ACO algorithms on a PC which has 4 GTX 480 GPUs. We implement two types of ACO models; the island model, and the master/slave model. When we compare the island model and the master/slave model, the island model shows promising speedup values on class (iv) QAP instances. On the other […]

CUDA

Sep, 24

GPU-based Offset Surface Computation using Point Samples

We present an efficient algorithm to perform approximate offsetting operations on geometric models using GPUs. Our approach approximates the boundary of an object with point samples and computes the offset by merging the balls centered at these points. The underlying approach uses Layered Depth Images (LDI) to organize the samples into structured points and performs […]

CUDA

Sep, 23

Exploring Multi-level Parallelism for Large-Scale Spiking Neural Networks

Several biologically inspired applications have been motivated by Spiking Neural Networks (SNNs) such as the Hodgkin-Huxley (HH) and Izhikevich models, owing to their high biological accuracy. The inherent massively parallel nature of the SNN simulations makes them a good fit for heterogeneous computing resources such as the General Purpose Graphical Processing Unit (GPGPU) clusters. In […]

CUDA

Sep, 23

Adaptive Treelet Meshes for Efficient Streak-Surface Visualization on the GPU

We describe a novel adaptive mesh representation for streak-surfaces. The surface is represented as a mesh of small trees of initial depth zero (treelets). This mesh representation allows for efficient integration, refinement, coarsening and appending of surface patches utilizing the computational capacities of modern GPUs. Integration, refinement, and rendering are strictly separated into effectively parallelizable […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Implementation and Analysis of AES Encryption on GPU

Speeding up Scoring Module of Mass Spectrometry Based Protein Identification by GPU

GPU Accelerated Lambert Solution Methods for the Orbital Targeting Problem

GPU in Physics Computation: Case Geant4 Navigation

Mobile Computational Photography, IS&T/SPIE Electronic Imaging 2013, EI 2013

Sound Speed Optimization Using Image Texture on CUDA

Resolution of the Vlasov-Maxwell system by PIC Discontinuous Galerkin method on GPU with OpenCL

A Hardware-Accelerated Parallel Implementation of a Two-Dimensional Scheme for Free Surface Flows

ACO on Multiple GPUs with CUDA for Faster Solution of QAPs

GPU-based Offset Surface Computation using Point Samples

Exploring Multi-level Parallelism for Large-Scale Spiking Neural Networks

Adaptive Treelet Meshes for Efficient Streak-Surface Visualization on the GPU

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)