Posts
Jul, 9
Parallelization Strategies for Local Search Algorithms on Graphics Processing Units
The purpose of this paper is to propose effective parallelization strategies for Local Search algorithms on Graphics Processing Units (GPU). We consider the distribution of the 3-opt neighborhood structure embedded in the Iterated Local Search framework. Three resulting approaches are evaluated and compared on both speedup and solution quality on a state-of-the-art Fermi GPU architecture. […]
Jul, 9
Discontinuous Galerkin Methods on Graphics Processing Units for Nonlinear Hyperbolic Conservation Laws
We present an implementation of the discontinuous Galerkin (DG) method for hyperbolic conservation laws in two dimensions on graphics processing units (GPUs) using NVIDIA’s Compute Unified Device Architecture (CUDA). Both flexible and highly accurate, DG methods accommodate parallel architectures well, as their discontinuous nature produces entirely element-local approximations. High performance scientific computing suits GPUs well, […]
Jul, 9
Computation of the Isogeometric Analysis Stiffness Matrix on GPU
Due to high regularity across mesh elements of isogeometric analysis, this new method achieves higher accuracy per degree of freedom and improved spectrum properties, among others, compared to finite element analysis. However, this inherent feature of isogeometric analysis reduces the sparsity pattern of stiffness matrix and requires more elaborate numerical integration schemes for its computation. […]
Jul, 9
Revisiting Co-Processing for Hash Joins on the Coupled CPU-GPU Architecture
Query co-processing on graphics processors (GPUs) has become an effective means to improve the performance of main memory databases. However, the relatively low bandwidth and high latency of the PCI-e bus are usually bottleneck issues for co-processing. Recently, coupled CPU-GPU architectures have received a lot of attention, e.g. AMD APUs with the CPU and the […]
Jul, 8
A Smart GPU Implementation of an Elliptic Kernel for an Ocean Global Circulation Model
In this paper, the preconditioning technique of an elliptic Laplace problem in a global circulation ocean model is analyzed. We suggest an inverse preconditioning technique in order to efficiently compute the numerical solution of the elliptic kernel. Moreover, we show how the convergence rate and the performance of the solver are strictly linked to the […]
Jul, 8
ParadisEO-MO-GPU: a Framework for Parallel GPU-based Local Search Metaheuristics
In this paper, we propose a pioneering framework called ParadisEO-MO-GPU for the reusable design and implementation of parallel local search metaheuristics (S-Metaheuristics) on Graphics Processing Units (GPU). We revisit the ParadisEO-MO software framework to allow its utilization on GPU accelerators focusing on the parallel iteration-level model, the major parallel model for S-Metaheuristics. It consists in […]
Jul, 8
Coalition Structure Generation with the Graphic Processor Unit
Coalition Structure Generation-the problem of finding the optimal set of coalitions – has received considerable attention in recent AI literature. The fastest exact algorithm to solve this problem is IDP-IP*, due to Rahwan et al. (2012). This algorithm is a hybrid of two previous algorithms, namely IDP and IP. As such, it is desirable to […]
Jul, 8
Comparison and Analysis of GPU Energy Efficiency For CUDA and OpenCL
The use of GPUs for processing large sets of parallelizable data has increased sharply in recent years. As the concept of GPU computing is still relatively young, parameters other than computation time, such as energy efficiency, are being overlooked. Two parallel computing platforms, CUDA and OpenCL, provide developers with an interface that they can use […]
Jul, 8
GPU Implementation of Real-Time Biologically Inspired Face Detection using CUDA
In this paper massively parallel real-time face detection based on a visual attention and cortex-like mechanism of cognitive vision system is presented. As a first step, we use saliency map model to select salient face regions and HMAX C1 model to extract features from salient input image and then apply mixture of expert neural network […]
Jul, 7
Comparison of Rectangular Matrix Multiplication with and without Border Conditions
Matrix multiplication algorithms are very common and widely used for computation in almost any field. There are many implementations for matrix multiplication on different platforms and programming models. GPU devices in the recent years have become powerful computational units that have entered the segment of high performance computing. In this paper we are analysing two […]
Jul, 7
Solving 3D Anisotropic Elastic Wave Equations on Parallel GPU Devices
Efficiently modelling seismic datasets in complex 3D anisotropic media by solving the 3D elastic wave equation is an important challenge in computational geophysics. Using a stress-stiffness formulation on a regular grid, we present a 3D finite-difference time-domain (FDTD) solver using a 2nd-order temporal and 8th-order spatial accuracy stencil that leverages the massively parallel architecture of […]
Jul, 7
A Comparative Study of Neighborhood Filters for Artifact Reduction in Iterative Low-Dose CT
Iterative CT algorithms have become increasingly popular in recent years. They have been found useful when the projections are limited in number, irregularly spaced, or noisy, which are often encountered in low-dose CT imaging. One way to cope with the associated streak and noise artifacts is to interleave a regularization objective into the iterative reconstruction […]