Posts
Sep, 22
Modification of self-organizing migration algorithm for OpenCL framework
This paper deals with modification of self-organizing migration algorithm using the OpenCL framework. This modification allows the algorithm to exploit modern parallel devices, like central processing units and graphics processing units. The main aim was to create algorithm which shows significant speedup when compared to sequential variant. Second aim was to create the algorithm robust […]
Sep, 21
Large-Scale Motion Modelling using a Graphical Processing Unit
The increased availability of Graphical Processing Units (GPUs) in personal computers has made parallel programming worthwhile and more accessible, but not necessarily easier. This thesis will take advantage of the power of a GPU, in conjunction with the Central Processing Unit (CPU), in order to simulate target trajectories for large-scale scenarios, such as wide-area maritime […]
Sep, 21
Some examples of instant computations of fluid dynamics on GPU
This paper is a summary of our experience feedback on GPU and GPGPU computing for two-dimensional computational fluid dynamics using fine grids and three-dimensional kinetic transport problems. The choice of the computational approach is clearly critical for both performance speedup and efficiency. In our numerical experiments, we used a Lattice Boltzmann approach (LBM) for the […]
Sep, 21
Parallelization of Hierarchical Text Clustering on Multi-core CUDA Architecture
Text Clustering is the problem of dividing text documents into groups, such that documents in same group are similar to one another and different from documents in other groups. Because of the general tendency of texts forming hierarchies, text clustering is best performed by using a hierarchical clustering method. An important aspect while clustering large […]
Sep, 21
Fast and Efficient Automatic Memory Management for GPUs using Compiler-Assisted Runtime Coherence Scheme
Exploiting the performance potential of GPUs requires managing the data transfers to and from them efficiently which is an errorprone and tedious task. In this paper, we develop a software coherence mechanism to fully automate all data transfers between the CPU and GPU without any assistance from the programmer. Our mechanism uses compiler analysis to […]
Sep, 21
Autotuning Wavefront Abstractions for Heterogeneous Architectures
We present our autotuned heterogeneous parallel programming abstraction for the wavefront pattern. An exhaustive search of the tuning space indicates that correct setting of tuning factors can average 37x speedup over a sequential baseline. Our best automated machine learning based heuristic obtains 92% of this ideal speedup, averaged across our full range of wavefront examples.
Sep, 20
Charged particles constrained to a curved surface
We study the motion of charged particles constrained to arbitrary two-dimensional curved surfaces but interacting in three-dimensional space via the Coulomb potential. To speed-up the interaction calculations, we use the parallel compute capability of the Compute Unified Device Architecture (CUDA) of todays graphics boards. The particles and the curved surfaces are shown using the Open […]
Sep, 20
Evolutionary Clustering on CUDA
Unsupervised clustering of large data sets is a complicated task. Due to its complexity, various meta-heuristic machine learning algorithms have been used to automate the clustering process. Genetic and evolutionary algorithms have been deployed to find clusters in data sets with success. The GPU computing is a recent programming paradigm introducing high performance parallel computing […]
Sep, 20
Binaural Simulations Using Audio Rate FDTD Schemes and CUDA
Three dimensional finite difference time domain schemes can be used as an approach to spatial audio simulation. By embedding a model of the human head in a 3D computational space, such simulations can emulate binaural sound localisation. This approach normally relies on using high sample rates to give finely detailed models, and is computationally intensive. […]
Sep, 20
Forecasting high frequency financial time series using parallel FFN with CUDA and ZeroMQ
Feed forward neural networks (FFNs) are powerful data-modelling tools that have been used in many fields of science. Specifically in financial applications, due to the number of factors affecting the market, models with a large quantity of input features, hidden and output neurons can be obtained. In financial problems, the response time is crucial and […]
Sep, 20
GPU-Acceleration of Linear Algebra using OpenCL
In this report we’ve created a linear algebra API using OpenCL, for use with MATLAB. We’ve demonstrated that the individual linear algebra components can be faster when using the GPU as compared to the CPU. We found that the API is heavily memory bound, but still faster than MATLAB in our testcase. The API components […]
Sep, 19
Direct GPU/FPGA Communication Via PCI Express
Parallel processing has hit mainstream computing in the form of CPUs, GPUs and FPGAs. While explorations proceed with all three platforms individually and with the CPU-GPU pair, little exploration has been performed with the synergy of GPU-FPGA. This is due in part to the cumbersome nature of communication between the two. This paper presents a […]