6150

Posts

Oct, 27

Off-axis quantitative phase imaging processing using CUDA: toward real-time applications

We demonstrate real time off-axis Quantitative Phase Imaging (QPI) using a phase reconstruction algorithm based on NVIDIA’s CUDA programming model. The phase unwrapping component is based on Goldstein’s algorithm. By mapping the process of extracting phase information and unwrapping to GPU, we are able to speed up the whole procedure by more than 18.8x with […]
Oct, 27

Parallelization of Single Threaded Applications using OpenMP and CUDA/C

Extracting performance improvements from modest and cost-effective computing resources is one of the key challenges in the IT sector. CPU clock speeds have reached a plateau in recent years, with no significant clock speed improvements forthcoming. However, we see an increasing number of computational cores available on the desktop, via the CPU and, more recently, […]
Oct, 27

Efficient Implementation and Evaluation of Methods for the Estimation of Motion in Image Sequences

Optical flow estimation (the estimation of the apparent motion of objects in an image sequence) is used in many applications like video compression, object detection and tracking, robot navigation, and so on. This project was focussed on one specific optical flow estimation algorithm, which uses directional filters and an AM-FM demodulation algorithm for the estimation […]
Oct, 27

Efficient Implementation of Optical Flow Algorithm Based on Directional Filters on a GPU Using CUDA

This paper describes an optical flow estimation algorithm using directional filters and an AM-FM demodulation algorithm, and its efficient implementation on a NVIDIA GPU using CUDA. The resulting implementation is several thousand times faster than the corresponding MATLAB code, which makes the described scheme suitable for real-time applications. This paper also describes a new multiresolution […]
Oct, 26

Dense Dynamic Programming on Multi GPU

The implementation via CUDA of a hybrid dense dynamic programming method for knapsack problems on amulti-GPU architecture is considered. Tests are carried out on a Bull cluster with Tesla S1070 computing systems. A first series of computational results shows substantial speedup. The speedup factor is close to 28 with two GPUs.
Oct, 26

Precision and Performance: Floating Point and IEEE 754 Compliance for NVIDIA GPUs

A number of issues related to floating point accuracy and compliance are a frequent source of confusion on both CPUs and GPUs. The purpose of this white paper is to discuss the most common issues related to NVIDIA GPUs and to supplement the documentation in the CUDA C Programming Guide.
Oct, 26

A case study on porting scientific applications to GPU/CUDA

This paper proposes and describes a methodology developed to port complex scientific applications originally written in FORTRAN to nVidia CUDA. The significance of this lies in the fact that, despite the performance improvement and programmer-friendliness provided by CUDA, it presently lacks support for FORTRAN. The methodology described in this paper addresses this problem using a […]
Oct, 26

Quasars spectra classification with the help of GPU computing

Finding interesting celestial objects among tens of thousands or even millions of recorded raw data is not an easy task to implement. In this paper we speed up this process with high level nvidia cuda C++ template library called Thrust, which makes our database with R interface much more evaluatedcient.
Oct, 26

Efficient Probabilistic Latent Semantic Indexing using Graphics Processing Unit

In this paper, we propose a scheme to accelerate the Probabilistic Latent Semantic Indexing (PLSI), which is an automated document indexing method based on a statistical latent semantic model, exploiting the high parallelism of Graphics Processing Unit (GPU). Our proposal is composed of three techniques: the first one is to accelerate the Expectation-Maximization (EM) computation […]
Oct, 26

A Memory Efficient and Fast Sparse Matrix Vector Product on a GPU

This paper proposes a new sparse matrix storage format which allows an efficient implementation of a sparse matrix vector product on a Fermi Graphics Processing Unit (GPU). Unlike previous formats it has both low memory footprint and good throughput. The new format, which we call Sliced ELLR-T has been designed specifically for accelerating the iterative […]
Oct, 26

Accelerated MD Program Using CUDA Technology

Molecular dynamic (MD) simulation is proven to be an important tool to study the structure as well as the physical properties at atomic level in materials science. However, it requires a huge computing time and hence limits the ability to treat a large scale simulation. In this paper we present a solution to speed up […]
Oct, 26

Evaluation of Speedup of Monte Carlo Calculations of Two Simple Reactor Physics Problems Coded for the GPU/CUDA Environment

Monte Carlo simulation is ideally suited for solving Boltzmann neutron transport equation in inhomogeneous media. However, routine applications require the computation time to be reduced to hours and even minutes in a desktop system. The interest in adopting GPUs for Monte Carlo acceleration is rapidly mounting, fueled partially by the parallelism afforded by the latest […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: