Posts
Mar, 12
A Study of Real-Time Lighting Effects
Realistic lighting is an incredibly complex problem. All surfaces scatter light to all other surfaces. Realistic lighting in volumes of fog or smoke is even more complex because each particle absorbs and scatters light. This problem has been approximated with many techniques but can take hours to produce a single image. Creating these images in […]
Mar, 11
GPU Accelerated Real-Time Object Detection on High Resolution Videos Using Modified Census Transform
This paper presents a novel GPU accelerated object detection system using CUDA. Because of its detection accuracy, speed and robustness to illumination variations, a boosting based approach with Modified Census Transform features is used. Results are given on the face detection problem for evaluation. Results show that even our single-GPU implementation can run in real-time […]
Mar, 11
Better speedups using simpler parallel programming for graph connectivity and biconnectivity
Speedups demonstrated for finding the biconnected components of a graph: 9x to 33x on the Explicit Multi-Threading (XMT) many-core computing platform relative to the best serial algorithm using a relatively modest silicon budget. Further evidence suggests that speedups of 21x to 48x are possible. For graph connectivity, we demonstrate that XMT outperforms two recent NVIDIA […]
Mar, 11
NUMA Data-Access Bandwidth Characterization and Modeling
Clusters of seemingly homogeneous compute nodes are increasingly heterogeneous within each node due to replication and distribution of node-level subsystems. This intra-node heterogeneity can adversely affect program execution performance by inflicting additional data-access performance penalties when accessing non-local data. In many modern NUMA architectures, both memory and I/O controllers are distributed within a node and […]
Mar, 11
An Algorithm for Fast Edit Distance Computation on GPUs
The problem of finding the edit distance between two sequences (and its closely related problem of longest common subsequence) are important problems with applications in many domains like virus scanners, security kernels, natural language translation and genome sequence alignment. The traditional dynamic-programming based algorithm is hard to parallelize on SIMD processors as the algorithm is […]
Mar, 11
GPU Path Tracing
The goal of this work is to verify the possibility to utilize GPU for global illumination computations in a commercial software environment and explore an efficient way to do it. Path tracing with BVH as the acceleration data structure was implemented on GPU using CUDA successfully. It was arranged as a pipelined structure which supported […]
Mar, 10
Performance Analysis of a Novel GPU Computation-to-core Mapping Scheme for Robust Facet Image Modeling
Though the GPGPU concept is well-known in image processing, much more work remains to be done to fully exploit GPUs as an alternative computation engine. This paper investigates the computation-to-core mapping strategies to probe the efficiency and scalability of the robust facet image modeling algorithm on GPUs. Our fine-grained computation-to-core mapping scheme shows a significant […]
Mar, 10
Acceleration of Solving Maxwell’s Equations Using Cluster of GPUs
Finite difference time domain (FDTD) is a numerical method for solving differential equations like Maxwell’s equations. Normally, simulation time of these equations is very long and there has been a great effort to reduce it. The most recent and useful way to reduce the simulation time of these equations is through using GPUs. Graphical processing […]
Mar, 10
CUDA Accelerated Face Recognition Using Local Binary Patterns
In this paper, we present a GPU accelerated face recognition framework using CUDA. We use weighted regional LBP histograms as features and k-nearest neighbour (k-NN) algorithm for classification. Our first contribution is to present an efficient way to compute LBP values from an input image and construct weighted regional LBP histograms in GPU using a […]
Mar, 10
GPU-Accelerated Large-Eddy Simulation of Turbulent Channel Flows
High performance computing clusters that are augmented with cost and power efficient graphics processing unit (GPU) provide new opportunities to broaden the use of large-eddy simulation technique to study high Reynolds number turbulent flows in fluids engineering applications. In this paper, we extend our earlier work on multi-GPU acceleration of an incompressible Navier-Stokes solver to […]
Mar, 10
Multi-Object Geodesic Active Contours (MOGAC): A Parallel Sparse-Field Algorithm for Image Segmentation
An important task for computer vision systems is to segment adjacent structures in images without producing gaps or overlaps. Multi-object Level Set Methods (MLSM) perform this task with the benefit of sub-pixel accuracy. However, current implementations of MLSM are not as computationally or memory efficient as their region growing and graph cut counterparts which lack […]
Mar, 9
Asynchronous Parallel Computing Model of Global Motion Estimation with CUDA
For video coding, weighing the balance between and coding rate image quality, we apply global motion search algorithm to avoid loss of image quality and parallel computing capacity of graphics processors to accelerate the encoding process. According to the heterogeneous system of CPU+GPU, and the multi-threaded parallel structure, thread synchronization features of CUDA platform, we […]