high performance computing on graphics processing units: hgpu.org

Posts

Sep, 21

A GPU Implementation of Parallel Constraint-based Local Search

In this paper we study the performance of constraint-based local search solvers on a GPU. The massively parallel architecture of the GPU makes it possible to explore parallelism at two different levels inside the local search algorithm. First, by executing multiple copies of the algorithm in a multi-walk manner and, second, by evaluating large neighborhoods […]

CUDA

Sep, 21

GPU Accelerated Parameter Estimation by Global Optimization using Interval Analysis

This master thesis treats the topic of non-linear parameter estimation using global optimization methods based on interval analysis (IA), accelerated by parallel implementation on a Graphics Processing Unit (GPU). Global optimization using IA is a mathematically rigorous Branch & Bound-type method, capable of reliably solving global optimization problems with continuously differentiable objective functions, even in […]

CUDA

Sep, 20

Preconditioned conjugate gradient solver for structural problems

Matrix solvers play a crucial role in solving real world physics problem. In engineering practice, transition analysis is most often used, which requires a series of similar matrices to be solved. However, any specific solver with/without preconditioner cannot achieve high performance gain for all matrices. This paper recommends Conjugate Gradient iterative solver with SSOR approximate […]

CUDA

Sep, 20

Can GPUs Sort Strings Efficiently?

String sorting or variable-length key sorting has lagged in performance on the GPU even as the fixed-length key sorting has improved dramatically. Radix sorting is the fastest on the GPUs. In this paper, we present a fast and efficient string sort on the GPU that is built on the available radix sort. Our method sorts […]

CUDA

Sep, 20

gNek: A GPU Accelerated Incompressible Navier Stokes Solver

This thesis presents a GPU accelerated implementation of a high order splitting scheme with a spectral element discretization for the incompressible Navier Stokes (INS) equations. While others have implemented this scheme on clusters of processors using the Nek5000 code, to my knowledge this thesis is the first to explore its performance on the GPU. This […]

OpenCL

Sep, 20

ClusterWatch: Flexible, Lightweight Monitoring for High-end GPGPU Clusters

The ClusterWatch middleware provides runtime flexibility in what system-level metrics are monitored, how frequently such monitoring is done, and how metrics are combined to obtain reliable information about the current behavior of GPGPU clusters. Interesting attributes of ClusterWatch are (1) the ease with which different metrics can be added to the system-by simply deploying additional […]

CUDA

Sep, 20

Comprehensive Evaluations of Cone-beam CT dose in Image-guided Radiation Therapy via GPU-based Monte Carlo simulations

Cone beam CT (CBCT) has been widely used for patient setup in image guided radiation therapy (IGRT). Radiation dose from CBCT scans has become a clinical concern. The purposes of this study are 1) to commission a GPU-based Monte Carlo (MC) dose calculation package gCTD for Varian On-Board Imaging (OBI) system and test the calculation […]

CUDA

Sep, 20

The Security of Key Derivation Functions in WINRAR

In various versions of WINRAR, the file security is mainly protected by user authentication and files encryption. Password based key derivation function (PBKDF) is the core of the WINRAR security mechanism. In this paper, the security of PBKDF algorithm and the encrypted file in WINRAR are analyzed by the Game-Playing approach. We show the upper […]

OpenCL

Sep, 20

Workload Analysis and Efficient OpenCL-based Implementation of SIFT Algorithm on a Smartphone

Feature detection and extraction are essential in computer vision applications such as image matching and object recognition. The Scale-Invariant Feature Transform (SIFT) algorithm is one of the most robust approaches to detect and extract distinctive invariant features from images. However, high computational complexity makes it difficult to apply the SIFT algorithm to mobile applications. Recent […]

OpenCL

Sep, 20

Rethinking the Union of Computed Tomography Reconstruction and GPGPU Computing

This work will present the utilization of the massively multi-threaded environment of graphics processors (GPUs) to improve the computation time needed to reconstruct large computed tomography (CT) datasets and the arising challenges for system implementation. Intelligent algorithm design for massively multi-threaded graphics processors differs greatly from traditional CPU algorithm design. Although a brute force port […]

CUDA

Sep, 20

Performance analysis of multi-core CPUs and GPU computing on SF-FDTD scheme for third order nonlinear materials and periodic media

The Split-Field Finite-Difference Time-Domain (SF-FDTD) scheme is an optimal formulation for modeling periodic optical media by means of a single unit period. The split-field components and the Periodic Boundary Condition (BPC) in the periodic boundaries allow to obtain successful results even with oblique angle of incidence. Under this situation the standard FDTD scheme requires multiple […]

CUDA

Sep, 20

Exponential Integrators on Graphics Processing Units

In this paper we revisit stencil methods on GPUs in the context of exponential integrators. We further discuss boundary conditions, in the same context, and show that simple boundary conditions (for example, homogeneous Dirichlet or homogeneous Neumann boundary conditions) do not affect the performance if implemented directly into the CUDA kernel. In addition, we show […]

CUDA