high performance computing on graphics processing units: hgpu.org

Posts

Apr, 8

On accelerating iterative algorithms with CUDA: A case study on Conditional Random Fields training algorithm for biological sequence alignment

The accuracy of Conditional Random Fields (CRF) is achieved at the cost of huge amount of computation to train model. In this paper we designed the parallelized algorithm for the Gradient Ascent based CRF training methods for biological sequence alignment. Our contribution is mainly on two aspects: 1) We flexibly parallelized the different iterative computation […]

CUDA

Apr, 8

A Comparative Study on ASIC, FPGAs, GPUs and General Purpose Processors in the O(N^2) Gravitational N-body Simulation

In this paper, we describe the implementation of gravitational force calculation for N-body simulations in the context of astrophysics. It will describe high performance implementations on general purpose processors, GPUs, and FPGAs, and compare them using a number of criteria including speed performance, power efficiency and cost of development. These results show that, for gravitational […]

Apr, 8

Program Optimization of Array-Intensive SPEC2k Benchmarks on Multithreaded GPU Using CUDA and Brook+

Graphic Processing Unit (GPU), with many light-weight data-parallel cores, can provide substantial parallel computing power to accelerate several general purpose applications. Both the AMD and NVIDIA corps provide their specific high performance GPUs and software platforms. As the floating-point computing capacity increases continually, the problem of “memory-wall” becomes more serious, especially for array-intensive applications. In […]

CUDA

Apr, 8

CANSCID-CUDA

The 2010 MEMOCODE Hardware Software Co-design challenge is to implement a Deep Packet Inspection architecture, called the CANSCID – Combined Architecture for Stream Categorization and Intrusion Detection. In this short paper, we present the design details of our submission, that utilizes a Graphical Processing Unit (GPU) to accelerate the parallel regular expression matching. The target […]

CUDA

Apr, 8

Shape Manipulation on GPU

This paper proposes a novel hardware-accelerating deformation algorithm based on curve-skeleton model for 2D shape manipulation. The deformation algorithm can achieve real-time interactive shape manipulation without any pre-computing step. The deforming regions of shapes are demarcated with a simple skeleton frame and are simulated by a curve-skeleton model consisting of triangle-strips. The algorithm obtains two […]

OpenGL

Apr, 7

High throughput multiple-precision GCD on the CUDA architecture

Investigation of the cryptanalytic strength of RSA cryptography requires computing many GCDs of two long integers (e.g., of length 1024 bits). This paper presents a high throughput parallel algorithm to perform many GCD computations concurrently on a GPU based on the CUDA architecture. The experiments with an NVIDIA GeForce GTX285 GPU and a single core […]

CUDA

Apr, 7

Program Optimization of Stencil Based Application on the GPU-Accelerated System

Graphic Processing Unit (GPU), with many light-weight data-parallel cores, can provide substantial parallel computational power to accelerate general purpose applications. But the powerful computing capacity could not be fully utilized for memory-intensive applications, which are limited by off-chip memory bandwidth and latency. Stencil computation has abundant parallelism and low computational intensity which make it a […]

Apr, 7

Scalability of Higher-Order Discontinuous Galerkin FEM Computations for Solving Electromagnetic Wave Propagation Problems on GPU Clusters

A highly parallel implementation of Maxwell’s equations in the time domain using a cluster of Graphics Processing Units (GPUs) is presented. The higher-order Discontinuous Galerkin Finite Element Method (DG-FEM) is used for spatial discretization since its characteristics are matching the parallelization design aspects of the NVIDIA Compute Unified Device Architecture (CUDA) programming model. Asynchronous data […]

CUDA

Apr, 7

Using GPU to Accelerate Cache Simulation

Caches play a major role in the performance of high speed computer systems. Trace driven simulator is the most widely used method to evaluate cache architectures. However, as the cache design moves to more complicated architectures, along with the size of the trace is becoming larger and larger. Traditional simulation methods are no longer practical […]

CUDA

Apr, 7

Accelerating Phase Correlation Functions Using GPU and FPGA

In this paper, we present a comparison study about implementations of phase correlation function using GPUs, ASIC and FPGAs. The Phase Only Correlation(POC) method demonstrates high robustness and subpixel accuracy in the pattern matching and the image registration. However, there is a disadvantage in computational speed because of the calculation of 2D-FFT etc. We have […]

CUDA

Apr, 7

A Neighborhood Grid Data Structure for Massive 3D Crowd Simulation on GPU

Simulation and visualization of emergent crowd in real-time is a computationally intensive task. This intensity mostly comes from the O(n2) complexity of the traversal algorithm, necessary for the proximity queries of all pair of entities in order to compute the relevant mutual interactions. Previous works reduced this complexity by considerably factors, using adequate data structures […]

CUDA

Apr, 7

Context-aware volume navigation

The trackball metaphor is exploited in many applications where volumetric data needs to be explored. Although it provides an intuitive way to inspect the overall structure of objects of interest, an in-detail inspection can be tedious – or when cavities occur even impossible. Therefore we propose a context-aware navigation technique for the exploration of volumetric […]

OpenCL

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Posts

On accelerating iterative algorithms with CUDA: A case study on Conditional Random Fields training algorithm for biological sequence alignment

A Comparative Study on ASIC, FPGAs, GPUs and General Purpose Processors in the O(N^2) Gravitational N-body Simulation

Program Optimization of Array-Intensive SPEC2k Benchmarks on Multithreaded GPU Using CUDA and Brook+

CANSCID-CUDA

Shape Manipulation on GPU

High throughput multiple-precision GCD on the CUDA architecture

Program Optimization of Stencil Based Application on the GPU-Accelerated System

Scalability of Higher-Order Discontinuous Galerkin FEM Computations for Solving Electromagnetic Wave Propagation Problems on GPU Clusters

Using GPU to Accelerate Cache Simulation

Accelerating Phase Correlation Functions Using GPU and FPGA

A Neighborhood Grid Data Structure for Massive 3D Crowd Simulation on GPU

Context-aware volume navigation

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)