Posts
Jul, 12
Scalable Techniques for Scheduling and Mapping DSP Applications onto Embedded Multiprocessor Platforms
A variety of multiprocessor architectures has proliferated even for off-the-shelf computing platforms. To make use of these platforms, traditional implementation frameworks focus on implementing Digital Signal Processing (DSP) applications using special platform features to achieve high performance. However, due to the fast evolution of the underlying architectures, solution redevelopment is error prone and re-usability of […]
Jul, 10
Evaluating different Java bindings for OpenCL
The traditional CPU is able to run only a few complex threads concurrently. By contrast, a GPU (Graphics Processing Unit) allows a concurrent execution of hundreds or thousands of simpler threads. The GPU was originally designed for a computer graphics, but nowadays it is being used for generalpurpose computation using a GPGPU (General Purpose GPU) […]
Jul, 10
Modelling sea water intrusion in coastal aquifers using heterogeneous computing
The objective of this PhD research program is to investigate numerical methods for simulating variably-saturated flow and sea water intrusion in coastal aquifers in a high-performance computing environment. The work is divided into three overlapping tasks: to develop an accurate and stable finite volume discretisation and numerical solution strategy for the variably-saturated flow and salt […]
Jul, 10
Meshfree/GFEM in hardware-efficiency prospective
A fundamental trend of processor architecture evolving towards exaflops is fast increasing floating point performance (so-called "free" flops) accompanied by much slowly increasing memory and network bandwidth. In order to fully enjoy the "free" flops, a numerical algorithm of PDEs should request more flops per byte or increase arithmetic intensity. A meshfree/GFEM approximation can be […]
Jul, 10
DistCL: A Framework for the Distributed Execution of OpenCL Kernels
GPUs are used to speed up many scientific computations; however, to use several networked GPUs concurrently, the programmer must explicitly partition work and transmit data between devices. We propose DistCL, a novel framework that distributes the execution of OpenCL kernels across a GPU cluster. DistCL makes multiple distributed compute devices appear to be a single […]
Jul, 10
Exploiting Data Parallelism in the yConvex Hypergraph Algorithm for Image Representation using GPGPUs
To define and identify a region-of-interest (ROI) in a digital image, the shape descriptor of the ROI has to be described in terms of its boundary characteristics. To address the generic issues of contour tracking, the yConvex Hypergraph (yCHG) model was proposed by Kanna et al [1]. In this work, we propose a parallel approach […]
Jul, 9
Hybrid Scheduling for Event-driven Simulation over Heterogeneous Computers
In this work we propose a new scheduling approach designed from scratch to maximize heterogeneous computers usage and the event processing flow at the same time. The scheduler is built based on three fundamental concepts which introduces a new vision of discrete event simulation: 1) events are clustered according to their potential time parallelism on […]
Jul, 9
Parallelization Strategies for Local Search Algorithms on Graphics Processing Units
The purpose of this paper is to propose effective parallelization strategies for Local Search algorithms on Graphics Processing Units (GPU). We consider the distribution of the 3-opt neighborhood structure embedded in the Iterated Local Search framework. Three resulting approaches are evaluated and compared on both speedup and solution quality on a state-of-the-art Fermi GPU architecture. […]
Jul, 9
Discontinuous Galerkin Methods on Graphics Processing Units for Nonlinear Hyperbolic Conservation Laws
We present an implementation of the discontinuous Galerkin (DG) method for hyperbolic conservation laws in two dimensions on graphics processing units (GPUs) using NVIDIA’s Compute Unified Device Architecture (CUDA). Both flexible and highly accurate, DG methods accommodate parallel architectures well, as their discontinuous nature produces entirely element-local approximations. High performance scientific computing suits GPUs well, […]
Jul, 9
Computation of the Isogeometric Analysis Stiffness Matrix on GPU
Due to high regularity across mesh elements of isogeometric analysis, this new method achieves higher accuracy per degree of freedom and improved spectrum properties, among others, compared to finite element analysis. However, this inherent feature of isogeometric analysis reduces the sparsity pattern of stiffness matrix and requires more elaborate numerical integration schemes for its computation. […]
Jul, 9
Revisiting Co-Processing for Hash Joins on the Coupled CPU-GPU Architecture
Query co-processing on graphics processors (GPUs) has become an effective means to improve the performance of main memory databases. However, the relatively low bandwidth and high latency of the PCI-e bus are usually bottleneck issues for co-processing. Recently, coupled CPU-GPU architectures have received a lot of attention, e.g. AMD APUs with the CPU and the […]
Jul, 8
A Smart GPU Implementation of an Elliptic Kernel for an Ocean Global Circulation Model
In this paper, the preconditioning technique of an elliptic Laplace problem in a global circulation ocean model is analyzed. We suggest an inverse preconditioning technique in order to efficiently compute the numerical solution of the elliptic kernel. Moreover, we show how the convergence rate and the performance of the solver are strictly linked to the […]