Posts
Jan, 18
Blasting through lattice calculations using CUDA
Modern graphics hardware is designed for highly parallel numerical tasks and provides significant cost and performance benefits. Graphics hardware vendors are now making available development tools to support general purpose high performance computing. Nvidia’s CUDA platform, in particular, offers direct access to graphics hardware through a programming language similar to C. Using the CUDA platform […]
Jan, 18
An exploration of CUDA and CBEA for a gravitational wave data-analysis application (Einstein@Home)
We present a detailed approach for making use of two new computer hardware architectures — CBEA and CUDA — for accelerating a scientific data-analysis application (Einstein@Home). Our results suggest that both the architectures suit the application quite well and the achievable performance in the same software developmental time-frame, is nearly identical.
Jan, 18
A novel approach for implementing Steganography with computing power obtained by combining Cuda and Matlab
With the current development of multiprocessor systems, strive for computing data on such processor have also increased exponentially. If the multi core processors are not fully utilized, then even though we have the computing power the speed is not available to the end users for their respective applications. In accordance to this, the users or […]
Jan, 18
A Multi-Stage CUDA Kernel for Floyd-Warshall
We present a new implementation of the Floyd-Warshall All-Pairs Shortest Paths algorithm on CUDA. Our algorithm runs approximately 5 times faster than the previously best reported algorithm. In order to achieve this speedup, we applied a new technique to reduce usage of on-chip shared memory and allow the CUDA scheduler to more effectively hide instruction […]
Jan, 18
Fast GPGPU Data Rearrangement Kernels using CUDA
Many high performance-computing algorithms are bandwidth limited, hence the need for optimal data rearrangement kernels as well as their easy integration into the rest of the application. In this work, we have built a CUDA library of fast kernels for a set of data rearrangement operations. In particular, we have built generic kernels for rearranging […]
Jan, 18
Parallelization of Weighted Sequence Comparison by using EBWT
The Extended Burrows Wheeler transform (EBWT) helps to find the distance between two sequences. Implementation of an existing algorithm takes considerable amount of time for small size sequences. In this paper, we give a parallel implementation of this algorithm using NVIDIA Compute Unified Device Architecture (CUDA). We have obtained, on an average, a 2X improvement […]
Jan, 18
GPGPUs in computational finance: Massive parallel computing for American style options
The pricing of American style and multiple exercise options is a very challenging problem in mathematical finance. One usually employs a Least-Square Monte Carlo approach (Longstaff-Schwartz method) for the evaluation of conditional expectations which arise in the Backward Dynamic Programming principle for such optimal stopping or stochastic control problems in a Markovian framework. Unfortunately, these […]
Jan, 18
Highly accelerated simulations of glassy dynamics using GPUs: caveats on limited floating-point precision
Modern graphics processing units (GPUs) provide impressive computing resources, which can be accessed conveniently through the CUDA programming interface. We describe how GPUs can be used to considerably speed up molecular dynamics (MD) simulations for system sizes ranging up to about 1 million particles. Particular emphasis is put on the numerical long-time stability in terms […]
Jan, 17
Automatic bi-layer video segmentation based on sensor fusion
We propose a new solution to the problem of bi-layer video segmentation in terms of both, hardware design and algorithmic solution. At the data acquisition stage, we combine color video with infrared video, which is robust to illumination changes and provides an automatic initialization of the cue map for foreground-background segmentation. Two algorithms are presented […]
Jan, 17
Accurate multi-view reconstruction using robust binocular stereo and surface meshing
This paper presents a new algorithm for multi-view reconstruction that demonstrates both accuracy and efficiency. Our method is based on robust binocular stereo matching, followed by adaptive point-based filtering of the merged point clouds, and efficient, high-quality mesh generation. All aspects of our method are designed to be highly scalable with the number of views. […]
Jan, 17
GPU-based Collision Detection for Deformable Parameterized Surfaces
Based on the potential of current programmable GPUs, recently several approaches were developed that use the GPU to calculate deformations of surfaces like the folding of cloth or to convert higher level geometry to renderable primitives like NURBS or subdivision surfaces. These algorithms are realized as a per-frame operation and take advantage of the parallel […]
Jan, 17
Visual Simulation of Flow
We have adopted a numerical method from computational fluid dynamics, the Lattice Boltzmann Method (LBM), for real-time simulation and visualization of flow and amorphous phenomena, such as clouds, smoke, fire, haze, dust, radioactive plumes, and air-borne biological or chemical agents. Unlike other approaches, LBM discretizes the micro-physics of local interactions and can handle very complex […]