Posts
Apr, 8
Record Setting Software Implementation of DES Using CUDA
The increase in computational power of off-the-shelf hardware offers more and more advantageous tradeoffs among efficiency, cost and availability, thus enhancing the feasibility of of cryptanalytic attacks aiming to lower the security of widely used cryptosystems. In this paper we illustrate an GPU-based software implementation of the most efficent variant of Data Encryption Standard (DES), […]
Apr, 8
On accelerating iterative algorithms with CUDA: A case study on Conditional Random Fields training algorithm for biological sequence alignment
The accuracy of Conditional Random Fields (CRF) is achieved at the cost of huge amount of computation to train model. In this paper we designed the parallelized algorithm for the Gradient Ascent based CRF training methods for biological sequence alignment. Our contribution is mainly on two aspects: 1) We flexibly parallelized the different iterative computation […]
Apr, 8
A Comparative Study on ASIC, FPGAs, GPUs and General Purpose Processors in the O(N^2) Gravitational N-body Simulation
In this paper, we describe the implementation of gravitational force calculation for N-body simulations in the context of astrophysics. It will describe high performance implementations on general purpose processors, GPUs, and FPGAs, and compare them using a number of criteria including speed performance, power efficiency and cost of development. These results show that, for gravitational […]
Apr, 8
Program Optimization of Array-Intensive SPEC2k Benchmarks on Multithreaded GPU Using CUDA and Brook+
Graphic Processing Unit (GPU), with many light-weight data-parallel cores, can provide substantial parallel computing power to accelerate several general purpose applications. Both the AMD and NVIDIA corps provide their specific high performance GPUs and software platforms. As the floating-point computing capacity increases continually, the problem of “memory-wall” becomes more serious, especially for array-intensive applications. In […]
Apr, 8
CANSCID-CUDA
The 2010 MEMOCODE Hardware Software Co-design challenge is to implement a Deep Packet Inspection architecture, called the CANSCID – Combined Architecture for Stream Categorization and Intrusion Detection. In this short paper, we present the design details of our submission, that utilizes a Graphical Processing Unit (GPU) to accelerate the parallel regular expression matching. The target […]
Apr, 8
Shape Manipulation on GPU
This paper proposes a novel hardware-accelerating deformation algorithm based on curve-skeleton model for 2D shape manipulation. The deformation algorithm can achieve real-time interactive shape manipulation without any pre-computing step. The deforming regions of shapes are demarcated with a simple skeleton frame and are simulated by a curve-skeleton model consisting of triangle-strips. The algorithm obtains two […]
Apr, 7
High throughput multiple-precision GCD on the CUDA architecture
Investigation of the cryptanalytic strength of RSA cryptography requires computing many GCDs of two long integers (e.g., of length 1024 bits). This paper presents a high throughput parallel algorithm to perform many GCD computations concurrently on a GPU based on the CUDA architecture. The experiments with an NVIDIA GeForce GTX285 GPU and a single core […]
Apr, 7
Program Optimization of Stencil Based Application on the GPU-Accelerated System
Graphic Processing Unit (GPU), with many light-weight data-parallel cores, can provide substantial parallel computational power to accelerate general purpose applications. But the powerful computing capacity could not be fully utilized for memory-intensive applications, which are limited by off-chip memory bandwidth and latency. Stencil computation has abundant parallelism and low computational intensity which make it a […]
Apr, 7
Scalability of Higher-Order Discontinuous Galerkin FEM Computations for Solving Electromagnetic Wave Propagation Problems on GPU Clusters
A highly parallel implementation of Maxwell’s equations in the time domain using a cluster of Graphics Processing Units (GPUs) is presented. The higher-order Discontinuous Galerkin Finite Element Method (DG-FEM) is used for spatial discretization since its characteristics are matching the parallelization design aspects of the NVIDIA Compute Unified Device Architecture (CUDA) programming model. Asynchronous data […]
Apr, 7
Using GPU to Accelerate Cache Simulation
Caches play a major role in the performance of high speed computer systems. Trace driven simulator is the most widely used method to evaluate cache architectures. However, as the cache design moves to more complicated architectures, along with the size of the trace is becoming larger and larger. Traditional simulation methods are no longer practical […]
Apr, 7
Accelerating Phase Correlation Functions Using GPU and FPGA
In this paper, we present a comparison study about implementations of phase correlation function using GPUs, ASIC and FPGAs. The Phase Only Correlation(POC) method demonstrates high robustness and subpixel accuracy in the pattern matching and the image registration. However, there is a disadvantage in computational speed because of the calculation of 2D-FFT etc. We have […]
Apr, 7
A Neighborhood Grid Data Structure for Massive 3D Crowd Simulation on GPU
Simulation and visualization of emergent crowd in real-time is a computationally intensive task. This intensity mostly comes from the O(n2) complexity of the traversal algorithm, necessary for the proximity queries of all pair of entities in order to compute the relevant mutual interactions. Previous works reduced this complexity by considerably factors, using adequate data structures […]