Posts
Apr, 9
Real-time parallel remote rendering for mobile devices using graphics processing units
Demand for 3D visualization is increasing in mobile devices as users have come to expect more realistic immersive experiences. However, limited networking and computing resources on mobile devices remain challenges. A solution is to have a proxy-based framework that offloads the burden of rendering computation from mobile devices to more powerful servers. We present the […]
Apr, 9
A Parallel Gibbs Sampling Algorithm for Motif Finding on GPU
Motif is overrepresented pattern in biological sequence and motif finding is an important problem in bioinformatics. Due to high computational complexity of motif finding, more and more computational capabilities are required as the rapid growth of available biological data, such as gene transcription data. Among many motif finding algorithms, Gibbs sampling is an effective method […]
Apr, 9
The method of improving performace of the GPU-accelerated 2D FDTD simulator
In this paper, several methods of optimizing parallel implementation of 2D FDTD algorithm are presented. Some practical problems occurring in real simulations are taken into consideration. Moreover, the presented methods are supported with appropriate tests and practical examples.
Apr, 8
Parallel Dense Gauss-Seidel Algorithm on Many-Core Processors
The Gauss-Seidel method is very efficient for solving problems such as tightly-coupled constraints with possible redundancies. However, the underlying algorithm is inherently sequential. Previous works have exploited sparsity in the system matrix to extract parallelism. In this paper, we propose to study several parallelization schemes for fully-coupled systems, unable to be parallelized by existing methods, […]
Apr, 8
Optimize or Wait? Using llc Fast-Prototyping Tool to Evaluate CUDA Optimizations
Over the last few years, we have witnessed the proliferation of GPU devices onHPC environments. Manufacturers produce new versions of their devices every few years, though, posing a new problem for scientists and engineers using their technology: is it worth the time and effort spent optimizing the codes for the current version? Or it is […]
Apr, 8
Support Vector Machines on GPU with Sparse Matrix Format
Emerging general-purpose Graphics Processing Unit (GPU) provides a multi-core platform for wide applications, including machine learning algorithms. In this paper, we proposed several techniques to accelerate Support Vector Machines (SVM) on GPUs. Sparse matrix format is introduced into parallel SVM to achieve better performance. Experimental results show that the speedup of 55x-133.8x over LIBSVM can […]
Apr, 8
High-Speed Implementations of Block Cipher ARIA Using Graphics Processing Units
The power of graphics processing unit(GPU) has been increasing rapidly more than that of CPU. It is not surprising that many software libraries were developed which enable us to use the power of GPU for general computations especially in parallel data processing. In this paper, we propose implementations of the standard block cipher ARIA of […]
Apr, 8
Record Setting Software Implementation of DES Using CUDA
The increase in computational power of off-the-shelf hardware offers more and more advantageous tradeoffs among efficiency, cost and availability, thus enhancing the feasibility of of cryptanalytic attacks aiming to lower the security of widely used cryptosystems. In this paper we illustrate an GPU-based software implementation of the most efficent variant of Data Encryption Standard (DES), […]
Apr, 8
On accelerating iterative algorithms with CUDA: A case study on Conditional Random Fields training algorithm for biological sequence alignment
The accuracy of Conditional Random Fields (CRF) is achieved at the cost of huge amount of computation to train model. In this paper we designed the parallelized algorithm for the Gradient Ascent based CRF training methods for biological sequence alignment. Our contribution is mainly on two aspects: 1) We flexibly parallelized the different iterative computation […]
Apr, 8
A Comparative Study on ASIC, FPGAs, GPUs and General Purpose Processors in the O(N^2) Gravitational N-body Simulation
In this paper, we describe the implementation of gravitational force calculation for N-body simulations in the context of astrophysics. It will describe high performance implementations on general purpose processors, GPUs, and FPGAs, and compare them using a number of criteria including speed performance, power efficiency and cost of development. These results show that, for gravitational […]
Apr, 8
Program Optimization of Array-Intensive SPEC2k Benchmarks on Multithreaded GPU Using CUDA and Brook+
Graphic Processing Unit (GPU), with many light-weight data-parallel cores, can provide substantial parallel computing power to accelerate several general purpose applications. Both the AMD and NVIDIA corps provide their specific high performance GPUs and software platforms. As the floating-point computing capacity increases continually, the problem of “memory-wall” becomes more serious, especially for array-intensive applications. In […]
Apr, 8
CANSCID-CUDA
The 2010 MEMOCODE Hardware Software Co-design challenge is to implement a Deep Packet Inspection architecture, called the CANSCID – Combined Architecture for Stream Categorization and Intrusion Detection. In this short paper, we present the design details of our submission, that utilizes a Graphical Processing Unit (GPU) to accelerate the parallel regular expression matching. The target […]