Posts
Dec, 4
Comparative Study of High Performance Computing Using Multi-core Parallel Systems
Multi-core based high performance computing systems are available with a reasonable price. Parallel programming paradigm needs to be adjusted to an individual system. Parallel computing systems were compared in this paper. Electroencephalography signals were collected in order to measure performance of parallel computing for CPU and GPU based systems. A CPU based system showed better […]
Dec, 4
HSPA+/LTE-A Turbo Decoder on GPU and Multicore CPU
This paper compares two implementations of reconfigurable and high-throughput turbo decoders. The first implementation is optimized for an NVIDIA Kepler graphics processing unit (GPU), whereas the second implementation is for an Intel Ivy Bridge processor. Both implementations support max-log-MAP and log-MAP turbo decoding algorithms, various code rates, different interleaver types, and all block-lengths, as specified […]
Dec, 4
Divergence Analysis
The growing interest in graphics processing units has brought renewed attention to the Single Instruction Multiple Data (SIMD) execution model. SIMD machines give application developers tremendous computational power; however, programming them is still challenging. In particular, developers must deal with memory and control flow divergences. These phenomena stem from a condition that we call data […]
Dec, 4
Fingerprint grid enhancement on GPU
This paper presents an optimized GPU (Graphics Processing Unit) implementation for fingerprint images enhancement using a Gabor filter-bank based algorithm. Given a batch of fingerprint images, we apply the Gabor filter bank and compute image variances of the convolution responses. We then select parts of these responses and compose the final enhanced batches. The algorithm […]
Dec, 3
Multithreaded Transposition of Square Matrices with Common Code for Intel Xeon Processors and Intel Xeon Phi Coprocessors
In-place matrix transposition, a standard operation in linear algebra, is a memory bandwidth-bound operation. The theoretical maximum performance of transposition is the memory copy bandwidth. However, due to non-contiguous memory access in the transposition operation, practical performance is usually lower. The ratio of the transposition rate to the memory copy bandwidth is a measure of […]
Dec, 3
GPU and CPU Cooperative Accelerated Road Detection
In this paper, we propose a fast and robust unstructured road detection method that integrates GPU (Graphics Processing Unit) and CPU implementations. In order to ensure the robustness of the algorithm, BP (Back Propagation) Neural Network is employed to learn the color features from a set of sample of both road region and off-road region, […]
Dec, 3
SESH framework: A Space Exploration Framework for GPU Application and Hardware Codesign
Graphics processing units (GPUs) have become increasingly popular accelerators in supercomputers, and this trend is likely to continue. With its disruptive architecture and a variety of optimization options, it is often desirable to understand the dynamics between potential application transformations and potential hardware features when designing future GPUs for scientific workloads. However, current codesign efforts […]
Dec, 3
Real-time High Resolution Fusion of Depth Maps on GPU
A system for live high quality surface reconstruction using a single moving depth camera on a commodity hardware is presented. High accuracy and real-time frame rate is achieved by utilizing graphics hardware computing capabilities via OpenCL and by using sparse data structure for volumetric surface representation. Depth sensor pose is estimated by combining serial texture […]
Dec, 3
Accelerated Event-by-Event Neutrino Oscillation Reweighting with Matter Effects on a GPU
Oscillation probability calculations are becoming increasingly CPU intensive in modern neutrino oscillation analyses. The independency of reweighting individual events in a Monte Carlo sample lends itself to parallel implementation on a Graphics Processing Unit. The library "Prob3++" was ported to the GPU using the CUDA C API, allowing for large scale parallelized calculations of neutrino […]
Nov, 30
GenBase: A Complex Analytics Genomics Benchmark
This paper introduces a new benchmark, designed to test database management system (DBMS) performance on a mix of data management tasks (joins, filters, etc.) and complex analytics (regression, singular value decomposition, etc.) Such mixed workloads are prevalent in a number of application areas, including most science workloads and web analytics. As a specific use case, […]
Nov, 30
Fractal Based Method on Hardware Acceleration for Natural Environments
Natural scenes from the real world are highly complex, such that the modeling and rendering of natural shapes, like mountains, trees and clouds, are very difficult and time consuming and require a huge amount of memory. Intuitively, the critical characteristics of natural scenes are their self- similarity properties. Motivated by the self-similarity feature of the […]
Nov, 30
Digitize Your Body and Action in 3-D at Over 10 FPS: Real Time Dense Voxel Reconstruction and Marker-less Motion Tracking via GPU Acceleration
In this paper, we present an approach to reconstruct 3-D human motion from multi-cameras and track human skeleton using the reconstructed human 3-D point (voxel) cloud. We use an improved and more robust algorithm, probabilistic shape from silhouette to reconstruct human voxel. In addition, the annealed particle filter is applied for tracking, where the measurement […]