8322

Posts

Sep, 21

Autotuning Wavefront Abstractions for Heterogeneous Architectures

We present our autotuned heterogeneous parallel programming abstraction for the wavefront pattern. An exhaustive search of the tuning space indicates that correct setting of tuning factors can average 37x speedup over a sequential baseline. Our best automated machine learning based heuristic obtains 92% of this ideal speedup, averaged across our full range of wavefront examples.
Sep, 20

Charged particles constrained to a curved surface

We study the motion of charged particles constrained to arbitrary two-dimensional curved surfaces but interacting in three-dimensional space via the Coulomb potential. To speed-up the interaction calculations, we use the parallel compute capability of the Compute Unified Device Architecture (CUDA) of todays graphics boards. The particles and the curved surfaces are shown using the Open […]
Sep, 20

Evolutionary Clustering on CUDA

Unsupervised clustering of large data sets is a complicated task. Due to its complexity, various meta-heuristic machine learning algorithms have been used to automate the clustering process. Genetic and evolutionary algorithms have been deployed to find clusters in data sets with success. The GPU computing is a recent programming paradigm introducing high performance parallel computing […]
Sep, 20

Binaural Simulations Using Audio Rate FDTD Schemes and CUDA

Three dimensional finite difference time domain schemes can be used as an approach to spatial audio simulation. By embedding a model of the human head in a 3D computational space, such simulations can emulate binaural sound localisation. This approach normally relies on using high sample rates to give finely detailed models, and is computationally intensive. […]
Sep, 20

Forecasting high frequency financial time series using parallel FFN with CUDA and ZeroMQ

Feed forward neural networks (FFNs) are powerful data-modelling tools that have been used in many fields of science. Specifically in financial applications, due to the number of factors affecting the market, models with a large quantity of input features, hidden and output neurons can be obtained. In financial problems, the response time is crucial and […]
Sep, 20

GPU-Acceleration of Linear Algebra using OpenCL

In this report we’ve created a linear algebra API using OpenCL, for use with MATLAB. We’ve demonstrated that the individual linear algebra components can be faster when using the GPU as compared to the CPU. We found that the API is heavily memory bound, but still faster than MATLAB in our testcase. The API components […]
Sep, 19

Direct GPU/FPGA Communication Via PCI Express

Parallel processing has hit mainstream computing in the form of CPUs, GPUs and FPGAs. While explorations proceed with all three platforms individually and with the CPU-GPU pair, little exploration has been performed with the synergy of GPU-FPGA. This is due in part to the cumbersome nature of communication between the two. This paper presents a […]
Sep, 19

Simulating spiking neural networks on GPU

Modern graphics cards contain hundreds of cores that can be programmed for intensive calculations. They are beginning to be used for spiking neural network simulations. The goal is to make parallel simulation of spiking neural networks available to a large audience, without the requirements of a cluster. We review the ongoing efforts towards this goal, […]
Sep, 19

Parallelization of a Block-Matching Algorithm

In this work we present a parallelization technique, together with its GPU implementation, for the full-search block-matching algorithm. This problem consists in finding the block that best matches a given reference template in terms of some photometric measure within a predefined search area. Block matching is a fundamental processing step for many signal-processing applications. Its […]
Sep, 19

Beauty And The Beast: Exploiting GPUs In Haskell

In this paper we compare a Haskell system that exploits a GPU back end using Obsidian against a number of other GPU/parallel processing systems. Our examples demonstrate two major results. Firstly they show that the Haskell system allows the applications programmer to exploit GPUs in a manner that eases the development of parallel code by […]
Sep, 19

Gauge fixing using overrelaxation and simulated annealing on GPUs

We adopt CUDA-capable Graphic Processing Units (GPUs) for Coulomb, Landau and maximally Abelian gauge fixing in 3+1 dimensional SU(3) lattice gauge field theories. The local overrelaxation algorithm is perfectly suited for highly parallel architectures. Simulated annealing preconditioning strongly increases the probability to reach the global maximum of the gauge functional. We give performance results for […]
Sep, 18

Implementation of QR Updating Algorithms on the GPU

The least squares problem is an extremely useful device to represent an approximate solution to overdetermined systems, and a QR factorisation is a common method for solving least squares problems. It is often the case that multiple least squares solutions have to be computed with only minor changes in the underlying data. In this case, […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org