Posts
Dec, 7
Sparse-Matrix-CG-Solver in CUDA
This paper describes the implementation of a parallelized conjugate gradient solver for linear equation systems using CUDA-C. Given a real, symmetric and positive definite coefficient matrix and a right-hand side, the parallized cg-solver is able to find a solution for that system by exploiting the massive compute power of todays GPUs. Comparing sequential CPU implementations […]
Dec, 7
Accelerating Braided B+ Tree Searches on a GPU with CUDA
Previous work has shown that using the GPU as a brute force method for SELECT statements on a SQLite database table yields significant speedups. However, this requires that the entire table be selected and transformed from the B-Tree to row-column format. This paper investigates possible speedups by traversing B+ Trees in parallel on the GPU, […]
Dec, 7
GPU-based solution of Continuous Time Markov Chains using CUSP
This technical report describes the parallelisation of the response-time analyser HYDRA using CUSP and the results of executing it on HECToR’s GPGPU testbed. We achieved good speed-ups in execution time, but these were outweighed by increased setup time.
Dec, 7
Effective Mapping of Grammatical Evolution to CUDA Hardware Model
Several papers have shown that symbolic regression is suitable for data analysis and prediction in ?nance markets. The Grammatical Evolution (GE) has been successfully applied in solving various tasks including symbolic regression. However, performance of this method can limit the area of possible applications. This paper deals with utilizing mainstream graphics processing unit (GPU) for […]
Dec, 7
Efficient Two-Level Preconditionined Conjugate Gradient Method on the GPU
We present an implementation of Two-Level Preconditioned Conjugate Gradient Method for the GPU. We investigate a Truncated Neumann Series based preconditioner in combination with deflation and compare it with Block Incomplete Cholesky schemes. This combination exhibits fine-grain parallelism and hence we gain considerably in execution time. It’s numerical performance is also comparable to the Block […]
Dec, 7
Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers
We describe a GPU- and multicore-oriented implementation technique for a key component of finite element based simulation toolkits for partial differential equations on unstructured grids: Geometric Multigrid solvers. We use efficient sparse matrix-vector multiplications throughout the solver pipeline: within the coarse-grid solver, smoothers and even grid transfers. Our implementation can handle several low- and high-order […]
Dec, 6
Automatic Fusions of CUDA-GPU Kernels for Parallel Map
When implementing a function mapping on the contemporary GPU, several contradictory performance factors affecting distribution of computation into GPU kernels have to be balanced. A decomposition-fusion scheme suggest to decompose computational problem to be solved by several simple functions implemented as standalone kernels and some of these functions later fuse into more complex kernels to […]
Dec, 6
Multiprocessing Acceleration of H.264/AVC Motion Estimation Full Search Algorithm under CUDA Architecture
This work presents a parallel GPU-based solution for the Motion Estimation (ME) process in a videoencoding system. We propose a way to partition the steps of Full Search block matching algorithm in the CUDA architecture, and to compare the performance with a theoretical model and two implementations (sequential and parallel using OpenMP library). We obtained […]
Dec, 6
DTAM: Dense tracking and mapping in real-time
DTAM is a system for real-time camera tracking and reconstruction which relies not on feature extraction but dense, every pixel methods. As a single hand-held RGB camera flies over a static scene, we estimate detailed textured depth maps at selected keyframes to produce a surface patchwork with millions of vertices. We use the hundreds of […]
Dec, 6
Massively Parallel Identification of Intersection Points for GPGPU Ray Tracing
The latest advancements in computer graphics architectures, as the replacement of some fixed stages of the pipeline for programmable stages (shaders), have been enabling the development of parallel general purpose applications on massively parallel graphics architectures (Streaming Processors). For years the graphics processing unit (GPU) is being optimized for increasingly high throughput of massively parallel […]
Dec, 6
Simulation of pollutant transport in shallow water on a CUDA architecture
Shallow water simulation enables the study of problems such as dam break, river, canal and coastal hydrodynamics, as well as the transport of inert substances, such as pollutants, on a fluid. This article describes a GPU efficient and cost-effective CUDA implementation of a finite volume numerical scheme for solving pollutant transport problems in bidimensional domains. […]
Dec, 6
GPU-Based Liquid Crystal Display Processing Platform
In the past decade liquid crystal displays (LCD) have taken over the television (TV) and monitor market from cathode ray tube (CRT) display. Compared to CRT displays, LCD offers larger screen sizes, higher resolution, thinner, lighter, and more energy efficient. However, with respect to image quality, LCD does not catch up to CRT display in […]