Posts
Sep, 24
A Hardware-Accelerated Parallel Implementation of a Two-Dimensional Scheme for Free Surface Flows
This contribution concerns the verification and performance assessment of a hardware-accelerated parallel implementation of an algorithm for the semi-implicit finite difference method for solving the vertically integrated shallow water equations including a non-linear treatment of wetting and drying and conservative advection schemes. Instead of adapting an existing serial, OpenMP-, or MPI-parallelised code with all necessary […]
Sep, 24
ACO on Multiple GPUs with CUDA for Faster Solution of QAPs
In this paper, we implement ACO algorithms on a PC which has 4 GTX 480 GPUs. We implement two types of ACO models; the island model, and the master/slave model. When we compare the island model and the master/slave model, the island model shows promising speedup values on class (iv) QAP instances. On the other […]
Sep, 24
GPU-based Offset Surface Computation using Point Samples
We present an efficient algorithm to perform approximate offsetting operations on geometric models using GPUs. Our approach approximates the boundary of an object with point samples and computes the offset by merging the balls centered at these points. The underlying approach uses Layered Depth Images (LDI) to organize the samples into structured points and performs […]
Sep, 23
Exploring Multi-level Parallelism for Large-Scale Spiking Neural Networks
Several biologically inspired applications have been motivated by Spiking Neural Networks (SNNs) such as the Hodgkin-Huxley (HH) and Izhikevich models, owing to their high biological accuracy. The inherent massively parallel nature of the SNN simulations makes them a good fit for heterogeneous computing resources such as the General Purpose Graphical Processing Unit (GPGPU) clusters. In […]
Sep, 23
Adaptive Treelet Meshes for Efficient Streak-Surface Visualization on the GPU
We describe a novel adaptive mesh representation for streak-surfaces. The surface is represented as a mesh of small trees of initial depth zero (treelets). This mesh representation allows for efficient integration, refinement, coarsening and appending of surface patches utilizing the computational capacities of modern GPUs. Integration, refinement, and rendering are strictly separated into effectively parallelizable […]
Sep, 23
Task Performance with List-Mode Data
This dissertation investigates the application of list-mode data to detection, estimation, and image reconstruction problems, with an emphasis on emission tomography in medical imaging. We begin by introducing a theoretical framework for list-mode data and we use it to define two observers that operate on list-mode data. These observers are applied to the problem of […]
Sep, 23
Computer Vision Application in Graphic Processors
Largely driven by the gaming industry, research and development of hardware tools for the generation of images, such as graphics cards (or GPU, Graphics Processing Units), experienced a tremendous growth in recent years. The increased power and flexibility and the low price of these GPUs have resulted in unexpected use in areas other than graphics. […]
Sep, 23
A Quantitative Study of Irregular Programs on GPUs
GPUs have been used to accelerate many regular applications and, more recently, irregular applications in which the control flow and memory access patterns are data-dependent and statically unpredictable. This paper defines two measures of irregularity called control-flow irregularity and memory-access irregularity, and investigates, using performance-counter measurements, how irregular GPU kernels differ from regular kernels with […]
Sep, 22
Computing of high breakdown regression estimators without sorting on graphics processing units
We present an approach to computing high-breakdown regression estimators in parallel on graphics processing units (GPU). We show that sorting the residuals is not necessary, and it can be substituted by calculating the median. We present and compare various methods to calculate the median and order statistics on GPUs. We introduce an alternative method based […]
Sep, 22
SpMV: A Memory-Bound Application on the GPU Stuck Between a Rock and a Hard Place
In this paper, we investigate the relative merits between GPGPUs and multicores in the context of sparse matrix-vector multiplication (SpMV). While GPGPUs possess impressive capabilities in terms of raw compute throughput and memory bandwidth, their performance varies significantly with application tuning as well as sparse input and format characteristics. Furthermore, several emerging technological and workload […]
Sep, 22
Overlapping computation and communication of three-dimensional FDTD on a GPU cluster
Large-scale electromagnetic field simulations using the FDTD (finite-difference time-domain) method require the use of GPU (graphics processing unit) clusters. However, the communication overhead caused by slow interconnections becomes a major performance bottleneck. In this paper, as a way to remove the bottleneck, we propose the "kernel-split method" and the "host-buffer method" which overlap computation and […]
Sep, 22
Exploration of Parallelization Frameworks for Computational Finance
This paper presents a comparison of parallelization frameworks for efficient execution of computational finance workloads. We use a Value-at-Risk (VaR) workload to evaluate OpenCL and OpenMP parallelization frameworks on multi-core CPUs as opposed to GPUs. In addition, we study the impact of SMT on performance using GCC (4.4) and IBM XLC (11.01) compilers for both […]