Posts
Apr, 4
Novel GPU Implementation of Jacobi Algorithm for Karhunen-Loeve Transform of Dense Matrices
Jacobi algorithm for Karhunen-Loeve transform of a symmetric real matrix, and its parallel implementation using chess tournament algorithm are revisited in this paper. Impact of memory access patterns and significance of memory coalescing on the performance of the GPU implementation for the parallel Jacobi algorithm are emphasized. Two novel memory access methods for the Jacobi […]
Apr, 4
Implementation Of Decoders for LDPC Block Codes and LDPC Convolutional Codes Based on GPUs
With the use of belief propagation (BP) decoding algorithm, low-density parity-check (LDPC) codes can achieve near-Shannon limit performance. LDPC codes can accomplish bit error rates (BERs) as low as $10^{-15}$ even at a small bit-energy-to-noise-power-spectral-density ratio ($E_{b}/N_{0}$). In order to evaluate the error performance of LDPC codes, simulators running on central processing units (CPUs) are […]
Apr, 3
Temporally Consistent Disparity and Optical Flow via Efficient Spatio-temporal Filtering
This paper presents a new efficient algorithm for computing temporally consistent disparity maps from video footage. Our method is motivated by recent work [1] that achieves high quality stereo results by smoothing disparity costs with a fast edge-preserving filter. This previous approach was designed to work with single static image pairs and does not maintain […]
Apr, 2
Towards Adaptive GPU Resource Management for Embedded Real-Time Systems
In this paper, we present two conceptual frameworks for GPU applications to adjust their task execution times based on total workload. These frameworks enable smart GPU resource management when many applications share GPU resources while the workloads of those applications vary. Application developers can explicitly adjust the number of GPU cores depending on their needs. […]
Apr, 2
An Efficient Block Cipher Implementation on Many-Core Graphics Processing Units
This paper presents a study on a high-performance design for a block cipher algorithm implemented on modern many-core graphics processing units (GPUs). The recent emergence of VLSI technology makes it feasible to fabricate multiple processing cores on a single chip and enables general-purpose computation on a GPU (GPGPU). The GPU strategy offers significant performance improvements […]
Apr, 2
On the Cryptanalysis of Public-Key Cryptography
Nowadays, the most popular public-key cryptosystems are based on either the integer factorization or the discrete logarithm problem. The feasibility of solving these mathematical problems in practice are studied and techniques are presented to speed-up the underlying arithmetic on parallel architectures. The fastest known approach to solve the discrete logarithm problem in groups of elliptic […]
Apr, 2
GPU Programming Strategies and Trends in GPU Computing
Over the last decade, there has been a growing interest in the use of graphics processing units (GPUs) for nongraphics applications. From early academic proof-of-concept papers around the year 2000, the use of GPUs has now matured to a point where there are countless industrial applications. Together with the expanding use of GPUs, we have […]
Mar, 31
Nested Data-Parallelism on the GPU
Graphics processing units (GPUs) provide both memory bandwidth and arithmetic performance far greater than that available on CPUs, but, because of their Single-Instruction-Multiple-Data (SIMD) architecture, they are hard to program. Most of the programs ported to GPUs thus far use traditional data-level parallelism, performing only operations that operate uniformly over vectors. Porting algorithms that do […]
Mar, 31
Distributed Password Cracking Platform
This project originates from the need for distribution when performing security testing-related password hash cracking. KPMG IT Advisory uses an MPI-supported John the Ripper cluster plus a separate system with several graphics cards for the cracking of password hashes. As they want to expand their operations, they wish to integrate GPU-capable machines with the current […]
Mar, 31
GHOST: GPGPU-Offloaded High Performance Storage I/O Deduplication for Primary Storage System
Data deduplication has been an effective way to eliminate redundant data mainly for backup storage systems. Since the recent primary storage systems in cloud services are expected to have the redundancy, the deduplication technique can also bring significant cost saving for the primary storage. However, the primary storage system requires high performance requirement about several […]
Mar, 31
Multi-GPU parallelization of a 3D Bayesian CT algorithm and its application on real foam reconstruction with incomplete data set
A great number of image reconstruction algorithms, based on analytical filtered backprojection, are implemented for X-ray Computed Tomography (CT) [1,2]. The limits of these methods appear when the number of projections is small, and/or not equidistributed around the object. That’s the case in the context of dynamic study of fluids in foams for example, the […]
Mar, 31
GPGPU-Accelerated Instruction Accurate and Fast Simulation of Thousand-core Platforms
Future architectures will feature hundreds to thousands of simple processors and on-chip memories connected through a network-on-chip. Architectural simulators will remain primary tools for design space exploration, performance (and power) evaluation of these massively parallel architectures. However, architectural simulation performance is a serious concern, as virtual platforms and simulation technology are not able to tackle […]