Posts
Sep, 22
Accelerating Habanero-Java Programs with OpenCL Generation
The initial wave of programming models for general-purpose computing on GPUs, led by CUDA and OpenCL, has provided experts with low-level constructs to obtain significant performance and energy improvements on GPUs. However, these programming models are characterized by a challenging learning curve for non-experts due to their complex and low-level APIs. Looking to the future, […]
Sep, 22
Investigating the Performance of Motion Estimation Block-Matching Algorithms on GPU Cards
In the field of video compression, motion estimation (ME) is a process that leads to high computational complexity. Implementation of ME block-matching (BM) algorithms on general purpose Central Processing Unit (CPU), has resulted in poor performance. In this paper we investigate the performance of two BM ME algorithms: Three Step Search (TSS) and Four Step […]
Sep, 22
Fast Endmember Extraction for Massive Hyperspectral Sensor Data on GPUs
Hyperspectral imaging sensor becomes increasingly important in multi-sensor collaborative observation. The spectral mixture problem seriously influences the efficiency of hyperspectral data exploitation, and endmember extraction is one of the key issues. Due to the high computational cost of algorithm and massive quantity of the hyperspectral sensor data, high-performance computing is extremely demanded for those scenarios […]
Sep, 22
Paralleling Variable Block Size Motion Estimation of HEVC on Multi- Core CPU Plus GPU Platform
Motion estimation with variable block sizes (VBSME) is one of the most complex models in the HEVC encoder. The HEVC standard supports up to 12 variable block sizes ranging from 4×8/8×4 to 64×64 for motion estimation (ME) and motion compensation (MC). This feature contributes substantial coding gain compared with 7 variable block sizes in H.264/AVC […]
Sep, 22
Geo-Correction of High-Resolution Imagery Using Fast Template Matching on a GPU in Emergency Mapping Contexts
The increasing availability of satellite imagery acquired by existing and new sensors allows a wide variety of new applications that depend on the use of diverse spectral and spatial resolution data sets. One of the pre-conditions for the use of hybrid image data sets is a consistent geo-correction capacity. We demonstrate how a novel fast […]
Sep, 21
Optimization solutions for the segmented sum algorithmic function
In this paper, there are depicted optimization solutions for the segmented sum algorithmic function, developed using the Compute Unified Device Architecture (CUDA), a powerful and efficient solution for optimizing a wide range of applications. The parallel-segmented sum is often used in building many data processing algorithms and through its optimization, one can improve the overall […]
Sep, 21
A streaming model for nested data parallelism
Efficient parallel algorithms are often written with embedded knowledge of the back-end that they are meant to be executed on, and if they are not, the translation to target language often produces inefficient code. A concrete problem is space complexity in nested data parallel (NDP) languages such as NESL and Data Parallel Haskell, where large […]
Sep, 21
Performing DCT8x8 Computation on GPU Using NVIDIA CUDA Technology
In this paper, we have proposed sequential and parallel Discrete Cosine Transform (DCT) in compute unified device architecture (CUDA) libraries. The introduction of programmable pipeline in the graphics processing units (GPU) has enabled configurability. GPU which is available in every computer has a tremendous feat of highly parallel SIMD processing, but its capability is often […]
Sep, 21
A GPU Implementation of Parallel Constraint-based Local Search
In this paper we study the performance of constraint-based local search solvers on a GPU. The massively parallel architecture of the GPU makes it possible to explore parallelism at two different levels inside the local search algorithm. First, by executing multiple copies of the algorithm in a multi-walk manner and, second, by evaluating large neighborhoods […]
Sep, 21
GPU Accelerated Parameter Estimation by Global Optimization using Interval Analysis
This master thesis treats the topic of non-linear parameter estimation using global optimization methods based on interval analysis (IA), accelerated by parallel implementation on a Graphics Processing Unit (GPU). Global optimization using IA is a mathematically rigorous Branch & Bound-type method, capable of reliably solving global optimization problems with continuously differentiable objective functions, even in […]
Sep, 20
Preconditioned conjugate gradient solver for structural problems
Matrix solvers play a crucial role in solving real world physics problem. In engineering practice, transition analysis is most often used, which requires a series of similar matrices to be solved. However, any specific solver with/without preconditioner cannot achieve high performance gain for all matrices. This paper recommends Conjugate Gradient iterative solver with SSOR approximate […]
Sep, 20
Can GPUs Sort Strings Efficiently?
String sorting or variable-length key sorting has lagged in performance on the GPU even as the fixed-length key sorting has improved dramatically. Radix sorting is the fastest on the GPUs. In this paper, we present a fast and efficient string sort on the GPU that is built on the available radix sort. Our method sorts […]

