Posts
Dec, 18
Face Detection with Improved Local Binary Patterns in CUDA
As mobile computing and user interactivity become more ubiquitous, accurate and fast facial detection mechanisms are necessary. And with the development of accessible parallel computing, it becomes possible to leverage the power of parallel algorithms to increase both speed and accuracy of facial detection systems. In this paper, we propose and analyse one such system […]
Dec, 18
Implementation of Stereo Matching Using High Level Compiler for Parallel Computing Acceleration
Heterogeneous computing system increases the performance of parallel computing in many domain of general purpose computing with CPU, GPU and other accelerators. With Hardware developments, the software developments like Compute Unified Device Architecture(CUDA) and Open Computing Language (OpenCL) try to offer a simple and visualized tool for parallel computing. But it turn out to be […]
Dec, 18
Parallelisation of Shallow Water Simulation for Heterogeneous Architectures
This work presents the parallelisation of a shallow water simulation model. Two parallel implementations are developed. One is for a multi-core NUMA architecture, developed in OpenMP. The other one is for a many-core GPU-accelerated architecture and is developed in OpenCL. The parallelisation process is based on an iterative approach, starting off from a naive implementation. […]
Dec, 18
Alternating Maximization: Unifying Framework for 8 Sparse PCA Formulations and Efficient Parallel Codes
Given a multivariate data set, sparse principal component analysis (SPCA) aims to extract several linear combinations of the variables that together explain the variance in the data as much as possible, while controlling the number of nonzero loadings in these combinations. In this paper we consider 8 different optimization formulations for computing a single sparse […]
Dec, 18
Improved FCM algorithm for Clustering on Web Usage Mining
In this paper we present clustering method is very sensitive to the initial center values, requirements on the data set too high, and cannot handle noisy data the proposal method is using information entropy to initialize the cluster centers and introduce weighting parameters to adjust the location of cluster centers and noise problems. The navigation […]
Dec, 18
Implementation of 3D FFTs Across Multiple GPUs in Shared Memory Environments
In this paper, a novel implementation of the distributed 3D Fast Fourier Transform (FFT) on a multi-GPU platform using CUDA is presented. The 3D FFT is the core of many simulation methods, thus its fast calculation is critical. The main bottleneck of the distributed 3D FFT is the global data exchange which must be performed. […]
Dec, 18
Theoretical and Numerical Analysis of Three Approaches to the GPGPU Application of the Explicit FDTD Method
The Finite-Difference Time-Domain method (FDTD) is a modelling technique for electromagnetic waves propagation. There is a great range of domains of application, for example geophysics, defence, microwaves like radar, or biomedicine. However, FDTD is a computationally intensive method, but has potential for parallelisation. The use of General-Purpose computing on Graphics Processing Units (GPGPU) is examined […]
Dec, 18
Accelerating Haskell Array Codes with Algorithmic Skeletons on GPUs
GPUs have been gaining popularity as general purpose parallel processors that deliver a performance to cost ratio superior to that of CPUs. However, programming on GPUs has remained a specialised area, as it often requires significant knowledge about the GPU architecture and platform-specific parallelisation of the algorithms that are implemented. Furthermore, the dominant programming models […]
Dec, 18
Single-Pass GPU-Raycasting for Structured Adaptive Mesh Refinement Data
Structured Adaptive Mesh Refinement (SAMR) is a popular numerical technique to study processes with high spatial and temporal dynamic range. It reduces computational requirements by adapting the lattice on which the underlying differential equations are solved to most efficiently represent the solution. Particularly in astrophysics and cosmology such simulations now can capture spatial scales ten […]
Dec, 18
Database Operation Development on the GPU
The performance of database operations has always been an important factor in database research being done. This has never been more important, as the quantity of data is growing at an alarming rate. This coupled with the recent growth of using graphics processors as general compute processors has led to many advancements in the field […]
Dec, 16
Acceleration of multivariate analysis techniques in TMVA using GPUs
A feasibility study into the acceleration of multivariate analysis techniques using Graphics Processing Units (GPUs) will be presented. The MLP-based Artificial Neural Network method contained in the TMVA framework has been chosen as a focus for investigation. It was found that the network training time on a GPU was lower than for CPU execution as […]
Dec, 16
Accuracy, Memory, and Speed Strategies in GPU-Based Finite-Element Matrix-Generation
This letter presents strategies on how to optimize graphics processing unit (GPU)-based finite-element matrix-generation that occurs in the finite element method (FEM) using higher-order curvilinear elements. The goal of the optimization is to increase the speed of evaluation and assembly of large finite-element matrices on a single GPU while maintaining the accuracy of numerical integration […]