high performance computing on graphics processing units: hgpu.org

Posts

Feb, 3

3D Registration Based on Normalized Mutual Information: Performance of CPU vs. GPU Implementation

Medical image registration is time-consuming but can be sped up employing parallel processing on the GPU. Normalized mutual information (NMI) is a well performing similarity measure for performing multi-modal registration. We present CUDA based solutions for computing NMI on the GPU and compare the results obtained by rigidly registering multi-modal data sets with a CPU […]

CUDA

Feb, 3

3D Information Extraction Based on GPU

Our project starts from a practical specific application of stereo vision (matching) on a robot arm, which is first building up a vision system for a robot arm to make it obtain the capability of detecting the objects 3D information on a plane. The kernel of the vision system is stereo matching. Stereo matching(correspondence) problem […]

Feb, 3

3D GPU Architecture using Cache Stacking: Performance, Cost, Power and Thermal analysis

Graphics Processing Units (GPUs) offer tremendous computational and processing power. The architecture requires high communication bandwidth and lower latency between computation units and caches. 3D die-stacking technology is a promising approach to meet such requirements. To the best of our knowledge no other study has investigated the implementation of 3D technology in GPUs. In this […]

CUDA

Feb, 3

3D finite element numerical integration on GPUs

The algorithmic and computational aspects of 3D finite element numerical integration on GPUs are investigated in the paper. The special stress is put on selecting the proper parallelization strategies depending upon the properties of FEM problems solved and approximations used. The close interplay between the available computational resources of GPUs and the possible implementation strategies […]

CUDA

Feb, 3

Data access optimized applications on the GPU using NVIDIA CUDA

This work is an attempt to address the problem of bandwidth limited performance of data intensive GPGPU applications. Performance limited by memory bandwidth is common issue faced by general data intensive HPC applications. In case of the GPU, this problem is more pronounced owing to the unique architecture. This problem has been tackled by optimizing […]

CUDA

Feb, 3

High Performance Power Spectrum Analysis Using a FPGA Based Reconfigurable Computing Platform

Power-spectrum analysis is an important tool providing critical information about a signal. The range of applications includes communication-systems to DNA-sequencing. If there is interference present on a transmitted signal, it could be due to a natural cause or superimposed forcefully. In the latter case, its early detection and analysis becomes important. In such situations having […]

Feb, 2

Real-time PCA calculation for spectral imaging (using SIMD and GP-GPU)

This article presents two optimized implementations of the PCA algorithm, primarily targeted on spectral image analysis in real time. One of them utilizes the SSE instruction set of contemporary CPUs, and the other one runs on graphics processors, using the CUDA environment. The implementations are evaluated and compared with a multithreaded C implementation compiled by […]

CUDA

Feb, 2

Software parallel CAVLC encoder based on stream processing

Real-time encoding of high-definition H.264 video is a challenge to current embedded programmable processors. Emerging stream processing methods supported by most GPUs and programmable processors provide a powerful mechanism to achieve surprising high performance in media/signal processing, which bring an opportunity to deal with this challenge. However, traditional serial CAVLC has highly input-dependent execution and […]

Feb, 2

Cortical architectures on a GPGPU

As the number of devices available per chip continues to increase, the computational potential of future computer architectures grows likewise. While this is a clear benefit for future computing devices, future chips will also likely suffer from more faulty devices and increased power consumption. It is also likely that these chips will be difficult to […]

CUDA

Feb, 2

Learning Two-View Stereo Matching

We propose a graph-based semi-supervised symmetric matching framework that performs dense matching between two uncalibrated wide-baseline images by exploiting the results of sparse matching as labeled data. Our method utilizes multiple sources of information including the underlying manifold structure, matching preference, shapes of the surfaces in the scene, and global epipolar geometric constraints for occlusion […]

Feb, 2

Adaptive enhancement and noise reduction in very low light-level video

A general methodology for noise reduction and contrast enhancement in very noisy image data with low dynamic range is presented. Video footage recorded in very dim light is especially targeted. Smoothing kernels that automatically adapt to the local spatio-temporal intensity structure in the image sequences are constructed in order to preserve and enhance fine spatial […]

Feb, 2

Delta-stepping: a parallelizable shortest path algorithm

The single source shortest path problem for arbitrary directed graphs with n nodes, m edges and nonnegative edge weights can sequentially be solved using O(n log n + m) operations. However, no work-efficient parallel algorithm is known that runs in sublinear time for arbitrary graphs. In this paper we present a rather simple algorithm for […]