Posts
Dec, 7
GPU Implementation of the Keccak Hash Function Family
Hash functions are one of the most important cryptographic primitives. Some of the currently employed hash functions like SHA-1 or MD5 are considered broken today. Therefore, in 2007 the US National Institute of Standards and Technology announced a competition for a new family of hash functions. Keccak is one of the five final candidates to […]
Dec, 7
Parallelizing AES on multicores and GPUs
The AES block cipher cryptographic algorithm is widely used and it is resource intensive. An existing sequential open source implementation of the algorithm was parallelized on multi-core microprocessors and GPUs. Performance results are presented.
Dec, 7
An Efficient Parallel Motion Estimation Algorithm and X264 Parallelization in CUDA
H.264/AVC video encoders have been widely used for its high coding efficiency. Since the computational demand proportional to the frame resolution is constantly increasing, it has been of great interest to accelerate H.264/AVC by parallel processing. Recently, graphics processing units (GPUs) have emerged as a viable target for accelerating general purpose applications by exploiting fine-grain […]
Dec, 7
Sparse-Matrix-CG-Solver in CUDA
This paper describes the implementation of a parallelized conjugate gradient solver for linear equation systems using CUDA-C. Given a real, symmetric and positive definite coefficient matrix and a right-hand side, the parallized cg-solver is able to find a solution for that system by exploiting the massive compute power of todays GPUs. Comparing sequential CPU implementations […]
Dec, 7
Accelerating Braided B+ Tree Searches on a GPU with CUDA
Previous work has shown that using the GPU as a brute force method for SELECT statements on a SQLite database table yields significant speedups. However, this requires that the entire table be selected and transformed from the B-Tree to row-column format. This paper investigates possible speedups by traversing B+ Trees in parallel on the GPU, […]
Dec, 7
GPU-based solution of Continuous Time Markov Chains using CUSP
This technical report describes the parallelisation of the response-time analyser HYDRA using CUSP and the results of executing it on HECToR’s GPGPU testbed. We achieved good speed-ups in execution time, but these were outweighed by increased setup time.
Dec, 7
Effective Mapping of Grammatical Evolution to CUDA Hardware Model
Several papers have shown that symbolic regression is suitable for data analysis and prediction in ?nance markets. The Grammatical Evolution (GE) has been successfully applied in solving various tasks including symbolic regression. However, performance of this method can limit the area of possible applications. This paper deals with utilizing mainstream graphics processing unit (GPU) for […]
Dec, 7
Efficient Two-Level Preconditionined Conjugate Gradient Method on the GPU
We present an implementation of Two-Level Preconditioned Conjugate Gradient Method for the GPU. We investigate a Truncated Neumann Series based preconditioner in combination with deflation and compare it with Block Incomplete Cholesky schemes. This combination exhibits fine-grain parallelism and hence we gain considerably in execution time. It’s numerical performance is also comparable to the Block […]
Dec, 7
Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers
We describe a GPU- and multicore-oriented implementation technique for a key component of finite element based simulation toolkits for partial differential equations on unstructured grids: Geometric Multigrid solvers. We use efficient sparse matrix-vector multiplications throughout the solver pipeline: within the coarse-grid solver, smoothers and even grid transfers. Our implementation can handle several low- and high-order […]
Dec, 6
Automatic Fusions of CUDA-GPU Kernels for Parallel Map
When implementing a function mapping on the contemporary GPU, several contradictory performance factors affecting distribution of computation into GPU kernels have to be balanced. A decomposition-fusion scheme suggest to decompose computational problem to be solved by several simple functions implemented as standalone kernels and some of these functions later fuse into more complex kernels to […]
Dec, 6
Multiprocessing Acceleration of H.264/AVC Motion Estimation Full Search Algorithm under CUDA Architecture
This work presents a parallel GPU-based solution for the Motion Estimation (ME) process in a videoencoding system. We propose a way to partition the steps of Full Search block matching algorithm in the CUDA architecture, and to compare the performance with a theoretical model and two implementations (sequential and parallel using OpenMP library). We obtained […]
Dec, 6
DTAM: Dense tracking and mapping in real-time
DTAM is a system for real-time camera tracking and reconstruction which relies not on feature extraction but dense, every pixel methods. As a single hand-held RGB camera flies over a static scene, we estimate detailed textured depth maps at selected keyframes to produce a surface patchwork with millions of vertices. We use the hundreds of […]