10591

Posts

Sep, 21

Optimization solutions for the segmented sum algorithmic function

In this paper, there are depicted optimization solutions for the segmented sum algorithmic function, developed using the Compute Unified Device Architecture (CUDA), a powerful and efficient solution for optimizing a wide range of applications. The parallel-segmented sum is often used in building many data processing algorithms and through its optimization, one can improve the overall […]
Sep, 21

A streaming model for nested data parallelism

Efficient parallel algorithms are often written with embedded knowledge of the back-end that they are meant to be executed on, and if they are not, the translation to target language often produces inefficient code. A concrete problem is space complexity in nested data parallel (NDP) languages such as NESL and Data Parallel Haskell, where large […]
Sep, 21

Performing DCT8x8 Computation on GPU Using NVIDIA CUDA Technology

In this paper, we have proposed sequential and parallel Discrete Cosine Transform (DCT) in compute unified device architecture (CUDA) libraries. The introduction of programmable pipeline in the graphics processing units (GPU) has enabled configurability. GPU which is available in every computer has a tremendous feat of highly parallel SIMD processing, but its capability is often […]
Sep, 21

A GPU Implementation of Parallel Constraint-based Local Search

In this paper we study the performance of constraint-based local search solvers on a GPU. The massively parallel architecture of the GPU makes it possible to explore parallelism at two different levels inside the local search algorithm. First, by executing multiple copies of the algorithm in a multi-walk manner and, second, by evaluating large neighborhoods […]
Sep, 21

GPU Accelerated Parameter Estimation by Global Optimization using Interval Analysis

This master thesis treats the topic of non-linear parameter estimation using global optimization methods based on interval analysis (IA), accelerated by parallel implementation on a Graphics Processing Unit (GPU). Global optimization using IA is a mathematically rigorous Branch & Bound-type method, capable of reliably solving global optimization problems with continuously differentiable objective functions, even in […]
Sep, 20

Preconditioned conjugate gradient solver for structural problems

Matrix solvers play a crucial role in solving real world physics problem. In engineering practice, transition analysis is most often used, which requires a series of similar matrices to be solved. However, any specific solver with/without preconditioner cannot achieve high performance gain for all matrices. This paper recommends Conjugate Gradient iterative solver with SSOR approximate […]
Sep, 20

Can GPUs Sort Strings Efficiently?

String sorting or variable-length key sorting has lagged in performance on the GPU even as the fixed-length key sorting has improved dramatically. Radix sorting is the fastest on the GPUs. In this paper, we present a fast and efficient string sort on the GPU that is built on the available radix sort. Our method sorts […]
Sep, 20

gNek: A GPU Accelerated Incompressible Navier Stokes Solver

This thesis presents a GPU accelerated implementation of a high order splitting scheme with a spectral element discretization for the incompressible Navier Stokes (INS) equations. While others have implemented this scheme on clusters of processors using the Nek5000 code, to my knowledge this thesis is the first to explore its performance on the GPU. This […]
Sep, 20

ClusterWatch: Flexible, Lightweight Monitoring for High-end GPGPU Clusters

The ClusterWatch middleware provides runtime flexibility in what system-level metrics are monitored, how frequently such monitoring is done, and how metrics are combined to obtain reliable information about the current behavior of GPGPU clusters. Interesting attributes of ClusterWatch are (1) the ease with which different metrics can be added to the system-by simply deploying additional […]
Sep, 20

Comprehensive Evaluations of Cone-beam CT dose in Image-guided Radiation Therapy via GPU-based Monte Carlo simulations

Cone beam CT (CBCT) has been widely used for patient setup in image guided radiation therapy (IGRT). Radiation dose from CBCT scans has become a clinical concern. The purposes of this study are 1) to commission a GPU-based Monte Carlo (MC) dose calculation package gCTD for Varian On-Board Imaging (OBI) system and test the calculation […]
Sep, 20

The Security of Key Derivation Functions in WINRAR

In various versions of WINRAR, the file security is mainly protected by user authentication and files encryption. Password based key derivation function (PBKDF) is the core of the WINRAR security mechanism. In this paper, the security of PBKDF algorithm and the encrypted file in WINRAR are analyzed by the Game-Playing approach. We show the upper […]
Sep, 20

Workload Analysis and Efficient OpenCL-based Implementation of SIFT Algorithm on a Smartphone

Feature detection and extraction are essential in computer vision applications such as image matching and object recognition. The Scale-Invariant Feature Transform (SIFT) algorithm is one of the most robust approaches to detect and extract distinctive invariant features from images. However, high computational complexity makes it difficult to apply the SIFT algorithm to mobile applications. Recent […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: