Posts
Apr, 21
GPU Encrypt: AES Encryption on Mobile Devices
In this report, we have taken the first steps in investigating the feasibility of using the GPU as a cryptographic accelerator for the AES algorithm on mobile devices. In particular, our focus was on exploring the use of OpenCL as a framework for implementing the algorithm. Using modifications of an existing implementation [11], we first […]
Apr, 21
Toward optimised skeletons for heterogeneous parallel architecture with performance cost model
High performance architectures are increasingly heterogeneous with shared and distributed memory components, and accelerators like GPUs. Programming such architectures is complicated and performance portability is a major issue as the architectures evolve. This thesis explores the potential for algorithmic skeletons integrating a dynamically parametrised static cost model, to deliver portable performance for mostly regular data […]
Apr, 21
SWAPHI: Smith-Waterman Protein Database Search on Xeon Phi Coprocessors
The maximal sensitivity of the Smith-Waterman (SW) algorithm has enabled its wide use in biological sequence database search. Unfortunately, the high sensitivity comes at the expense of quadratic time complexity, which makes the algorithm computationally demanding for big databases. In this paper, we present SWAPHI, the first parallelized algorithm employing Xeon Phi coprocessors to accelerate […]
Apr, 21
Fast Efficient Artificial Neural Network for Handwritten Digit Recognition
Handwriting recognition is having high demand in commercial & academics. In recent years lots of good work has been done on hand written digit recognition to improve accuracy. Handwritten digit recognition system needs larger dataset and long training time to improve accuracy & reduce error rate. Training of Neural Networks for large data sets is […]
Apr, 21
Sparser, Better, Faster GPU Parsing
Due to their origin in computer graphics, graphics processing units (GPUs) are highly optimized for dense problems, where the exact same operation is applied repeatedly to all data points. Natural language processing algorithms, on the other hand, are traditionally constructed in ways that exploit structural sparsity. Recently, Canny et al. (2013) presented an approach to […]
Apr, 21
Content Based Image Retrieval with Graphical Processing Unit
CBIR is the method of searching the digital images from an image database. "Content-based" means that the search analyzes the contents of the image rather than the metadata such as colours, shapes, textures, or any other information that can be derived from the image itself. The GPU is a powerful graphics engine and a highly […]
Apr, 21
A GPU-Based Enhanced Genetic Algorithm for Power-Aware Task Scheduling Problem in HPC Cloud
In this paper, we consider power-aware task scheduling (PATS) in HPC clouds. Users request virtual machines (VMs) to execute their tasks. Each task is executed on one single VM, and requires a fixed number of cores (i.e., processors), computing power (million instructions per second – MIPS) of each core, a fixed start time and non-preemption […]
Apr, 21
Rapid Rabbit: Highly Optimized GPU Accelerated Cone-Beam CT Reconstruction
Graphical processing units (GPUs) have become widely adopted in the medical imaging community. The parallel SIMD nature of GPUs maps perfectly to many reconstruction algorithms. Because of this, it is relatively straightforward to parallelize common reconstruction algorithms (e.g. FDK backprojection). This means that significant performance improvements must come from careful memory optimizations, exploiting ASICs and […]
Apr, 21
GACO: A GPU-based High Performance Parallel Multi-ant Colony Optimization Algorithm
As a population-based algorithm, Ant Colony Optimization (ACO) is intrinsically massively parallel, and therefore it is expected to be well-suited for implementation on GPUs (Graphics Processing Units). In this paper, we present a novel ant colony optimization algorithm (called GACO), which based on Compute Unified Device Architecture (CUDA) enabled GPU. In GACO algorithm, we utilize […]
Apr, 21
Computational cost estimates for parallel shared memory isogeometric multi-frontal solvers
In this paper we present computational cost estimates for parallel shared memory isogeometric multi-frontal solver. The estimates show that the ideal isogeometric shared memory parallel direct solver scales as O(p^2 log(N/p)) for one dimensional problems, O(Np^2) for two dimensional problems, and O(N^(4/3)p^2) for three dimensional problems, where N is the number of degrees of freedom, […]
Apr, 19
An Automated Tool for Converting Directive Based C Code Into Parallel CUDA Code
Parallel programming has become simple and reasonable with the preamble of GPGPUs. Now a day’s many programmers transfer their application to GPGPUs with the accessibility of APIs such as NVIDIA’s CUDA. But it is very tricky task to write CUDA program. Most of the industry extensively uses the immense serial C code, and they are […]
Apr, 19
Collision Detection Based on Fuzzy Scene Subdivision
We present a novel approach to perform collision detection queries between rigid and/or deformable models. Our method can handle arbitrary deformations and even discontinuous ones. For this, we subdivide the whole scene with all objects into connected but totally independent parts by a fuzzy clustering algorithm. Following, for every part our algorithm performs a Principal […]