8324

Posts

Sep, 21

Parallelization of Hierarchical Text Clustering on Multi-core CUDA Architecture

Text Clustering is the problem of dividing text documents into groups, such that documents in same group are similar to one another and different from documents in other groups. Because of the general tendency of texts forming hierarchies, text clustering is best performed by using a hierarchical clustering method. An important aspect while clustering large […]
Sep, 21

Fast and Efficient Automatic Memory Management for GPUs using Compiler-Assisted Runtime Coherence Scheme

Exploiting the performance potential of GPUs requires managing the data transfers to and from them efficiently which is an errorprone and tedious task. In this paper, we develop a software coherence mechanism to fully automate all data transfers between the CPU and GPU without any assistance from the programmer. Our mechanism uses compiler analysis to […]
Sep, 21

Autotuning Wavefront Abstractions for Heterogeneous Architectures

We present our autotuned heterogeneous parallel programming abstraction for the wavefront pattern. An exhaustive search of the tuning space indicates that correct setting of tuning factors can average 37x speedup over a sequential baseline. Our best automated machine learning based heuristic obtains 92% of this ideal speedup, averaged across our full range of wavefront examples.
Sep, 20

Charged particles constrained to a curved surface

We study the motion of charged particles constrained to arbitrary two-dimensional curved surfaces but interacting in three-dimensional space via the Coulomb potential. To speed-up the interaction calculations, we use the parallel compute capability of the Compute Unified Device Architecture (CUDA) of todays graphics boards. The particles and the curved surfaces are shown using the Open […]
Sep, 20

Evolutionary Clustering on CUDA

Unsupervised clustering of large data sets is a complicated task. Due to its complexity, various meta-heuristic machine learning algorithms have been used to automate the clustering process. Genetic and evolutionary algorithms have been deployed to find clusters in data sets with success. The GPU computing is a recent programming paradigm introducing high performance parallel computing […]
Sep, 20

Binaural Simulations Using Audio Rate FDTD Schemes and CUDA

Three dimensional finite difference time domain schemes can be used as an approach to spatial audio simulation. By embedding a model of the human head in a 3D computational space, such simulations can emulate binaural sound localisation. This approach normally relies on using high sample rates to give finely detailed models, and is computationally intensive. […]
Sep, 20

Forecasting high frequency financial time series using parallel FFN with CUDA and ZeroMQ

Feed forward neural networks (FFNs) are powerful data-modelling tools that have been used in many fields of science. Specifically in financial applications, due to the number of factors affecting the market, models with a large quantity of input features, hidden and output neurons can be obtained. In financial problems, the response time is crucial and […]
Sep, 20

GPU-Acceleration of Linear Algebra using OpenCL

In this report we’ve created a linear algebra API using OpenCL, for use with MATLAB. We’ve demonstrated that the individual linear algebra components can be faster when using the GPU as compared to the CPU. We found that the API is heavily memory bound, but still faster than MATLAB in our testcase. The API components […]
Sep, 19

Direct GPU/FPGA Communication Via PCI Express

Parallel processing has hit mainstream computing in the form of CPUs, GPUs and FPGAs. While explorations proceed with all three platforms individually and with the CPU-GPU pair, little exploration has been performed with the synergy of GPU-FPGA. This is due in part to the cumbersome nature of communication between the two. This paper presents a […]
Sep, 19

Simulating spiking neural networks on GPU

Modern graphics cards contain hundreds of cores that can be programmed for intensive calculations. They are beginning to be used for spiking neural network simulations. The goal is to make parallel simulation of spiking neural networks available to a large audience, without the requirements of a cluster. We review the ongoing efforts towards this goal, […]
Sep, 19

Parallelization of a Block-Matching Algorithm

In this work we present a parallelization technique, together with its GPU implementation, for the full-search block-matching algorithm. This problem consists in finding the block that best matches a given reference template in terms of some photometric measure within a predefined search area. Block matching is a fundamental processing step for many signal-processing applications. Its […]
Sep, 19

Beauty And The Beast: Exploiting GPUs In Haskell

In this paper we compare a Haskell system that exploits a GPU back end using Obsidian against a number of other GPU/parallel processing systems. Our examples demonstrate two major results. Firstly they show that the Haskell system allows the applications programmer to exploit GPUs in a manner that eases the development of parallel code by […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: