Posts
Jan, 10
High-speed volume ray casting with CUDA
Volume ray casting experiences a renewed interest in the last decade. Largely due to the graphics hardware, which enabled real-time implementations competitive in speed with slicing. However these implementations need specialized shader languages and are forced to use graphics APIs. It makes implementation of advanced methods difficult and hinders performance, bending the programming and execution […]
Jan, 10
Parallel drainage network computation on CUDA
Drainage networks determination from Digital Elevation Models (DEM) has been a widely studied problem in the last three decades. During this time, satellite technology has been improving and optimizing digitalized images, and computers have been increasing their capabilities to manage such a huge quantity of information. The rapid growth of CPU power and memory size […]
Jan, 10
Canny edge detection on NVIDIA CUDA
The Canny edge detector is a very popular and effective edge feature detector that is used as a pre-processing step in many computer vision algorithms. It is a multi-step detector which performs smoothing and filtering, non-maxima suppression, followed by a connected-component analysis stage to detect ldquotruerdquo edges, while suppressing ldquofalserdquo non edge filter responses. While […]
Jan, 10
Molecular Dynamics Simulations on Commodity GPUs with CUDA
Molecular dynamics simulations are a common and often repeated task in molecular biology. The need for speeding up this treatment comes from the requirement for large system simulations with many atoms and numerous time steps. In this paper we present a new approach to high performance molecular dynamics simulations on graphics processing units. Using modern […]
Jan, 10
Comparing Hardware Accelerators in Scientific Applications: A Case Study
Multi-core processors and a variety of accelerators have allowed scientific applications to scale to larger problem sizes. We present a performance, design methodology, platform, and architectural comparison of several application accelerators executing a Quantum Monte Carlo application. We compare the application’s performance and programmability on a variety of platforms including CUDA with Nvidia GPUs, Brook+ […]
Jan, 10
Neville elimination on multi- and many-core systems: OpenMP, MPI and CUDA
This paper describes several parallel algorithmic variations of the Neville elimination. This elimination solves a system of linear equations making zeros in a matrix column by adding to each row an adequate multiple of the preceding one. The parallel algorithms are run and compared on different multi- and many-core platforms using parallel programming techniques as […]
Jan, 10
Importance sampling algorithms for first passage time probabilities in the infinite server queue
This paper applies importance sampling simulation for estimating rare event probabilities of the first passage time in the infinite server queue with renewal arrivals and general service time distributions. We consider importance sampling algorithms which are based on large deviations results of the infinite server queue, and we consider an algorithm based on the cross-entropy […]
Jan, 10
Dense optical flow by iterative local window registration
We study dense optical flow estimation using iterative registration of local window, also known as iterative Lucas-Kanade (LK) [B. Lucas et al, 1981]. We show that the usual iterative-warping scheme encounters divergence problems and propose a modified scheme with better behavior. It yields good results with a much lower cost than the exact dense LK […]
Jan, 10
CUDA-Based Radiative Transfer Method with Application to the EM Scattering from a Two-Layer Canopy Model
In step with the great efforts to find out the scattering contributions of a large number of samples in the vegetation canopy, intensive computational burden occurs and obviously lames the application of the traditional serial algorithm on the basis of the radiative transfer theory to evaluate the electromagnetic (EM) scattering from vegetations. Nevertheless, the Compute […]
Jan, 10
Connected component labeling on a 2D grid using CUDA
Connected component labeling is an important but computationally expensive operation required in many fields of research. The goal in the present work is to label connected components on a 2D binary map. Two different iterative algorithms for doing this task are presented. The first algorithm (Row-Col Unify) is based upon the directional propagation labeling, whereas […]
Jan, 9
Tapping the supercomputer under your desk: solving dynamic equilibrium models with graphics processors?
This paper shows how to build algorithms that use graphics processing units (GPUs) installed in most modern computers to solve dynamic equilibrium models in economics. In particular, we rely on the compute unified device architecture (CUDA) of NVIDIA GPUs. We illustrate the power of the approach by solving a simple real business cycle model with […]
Jan, 9
Parallel Prefix Sum (Scan) with CUDA
Parallel prefix sum, also known as parallel Scan, is a useful building block for many parallel algorithms including sorting and building data structures. In this document we introduce Scan and describe step-by-step how it can be implemented efficiently in NVIDIA CUDA. We start with a basic naive algorithm and proceed through more advanced techniques to […]