Posts
Jul, 16
Object support for OpenMP-style programming of GPU clusters in Java
For scientists, it is advantageous to use a high level of abstraction for programming their simulations, so that they can focus on the problem at hand instead of struggling with low-level details. However, current HPC clusters with multiple GPUs per node only offer explicit communication to and from the GPUs, require manual work to keep […]
Jul, 16
Efficient algorithms for the realistic simulation of fluids
Nowadays there is great demand for realistic simulations in the computer graphics field. Physically-based animations are commonly used, and one of the more complex problems in this field is fluid simulation, more so if real-time applications are the goal. Videogames, in particular, resort to different techniques that, in order to represent fluids, just simulate the […]
Jul, 15
Hydrodynamic Computation with Hybrid Programming on CPU-GPU Clusters
The explosion of parallelism and heterogeneity in today’s computer architectures has created opportunities as well as challenges for redesigning legacy numerical software to harness the power of new hardware. In this paper we address the main challenges in redesigning BLAST { a numerical library that solves the equations of compressible hydrodynamics using high order finite […]
Jul, 15
A framework for cost based optimization of hybrid CPU/GPU query plans in database systems
Current database research identified the use of computational power of GPUs as a way to increase the performance of database systems. As GPU algorithms are not necessarily faster than their CPU counterparts, it is important to use the GPU only if it is beneficial for query processing. In a general database context, only few research […]
Jul, 15
KernelGen – the design and implementation of a next generation compiler platform for accelerating numerical models on GPUs
GPUs are becoming pervasive in scientific computing. Originally served as peripheral accelerators, now they are gradually turning into central computing nodes. However, most current directive-based approaches for parallelizing sequential legacy code such as OpenACC and HMPP simply off-load "hot" CPU code onto GPUs, entailing a lot of limitations such as unsupported external calls and coarse-grained […]
Jul, 15
Parallelization of SAT Algorithms on GPUs
The Boolean Satisfability Problem is one of the most important problems in computer science with applications spanning many areas of research. Despite this importance and the extensive study and improvements that have been made, no efficient solution to the problem has been found to the date. During the last years, nVidia introduced CUDA, a platform […]
Jul, 15
CUDA-C implementation of the ADER-DG method for linear hyperbolic PDEs
We implement the ADER-DG numerical method using the CUDA-C language to run the code in a Graphic Processing Unit (GPU). We focus on solving linear hyperbolic partial differential equations where the method can be expressed as a combination of precomputed matrix multiplications becoming a good candidate to be used on the GPU hardware. Moreover, the […]
Jul, 15
GPU Based Implementation of Recursive Digital Filtering Algorithms
Recursive filtering is widely used for many signal processing applications. Speeding-up the computation of recursive filtering using many processing elements is difficult because of the dependency problem. In this paper, massively parallel computation of recursive filtering algorithms using GPGPUs (General Purpose Graphics Processing Units) is studied. The proposed method uses the multi-block parallel processing algorithm, […]
Jul, 15
Exploiting Space and Time Coherence in Grid-based Sorting
In recent years, many approaches for real-time simulation of physical phenomena using particles have been proposed. Many of these use 3D grids for representing spatial distributions and employ a collision detection technique where particles must be sorted with respect to the cells they occupy. In this paper we propose several techniques that make it possible […]
Jul, 15
Near-LSPA Performance at MSA Complexity
The tradeoff between error-correcting performance and numerical complexity of LDPC decoding algorithms is a well-known problem. In this paper we depict the unseen error-floor performance of the Self-Corrected Min-Sum algorithm for long length DVB-S2 codes. We developed a massively parallel simulation using GPUs which allowed a comprehensive BER characterization either in the waterfall or in […]
Jul, 14
Equilibrium and Non-Equilibrium Ising Models by Means of PCA
We propose a unified approach to reversible and irreversible PCA dynamics, and we show that in the case of 1D and 2D nearest neighbour Ising systems with periodic boundary conditions we are able to compute the stationary measure of the dynamics also when the latter is irreversible. We also show how, according to [DPSS12], the […]
Jul, 14
Benchmarking Intel Xeon Phi to Guide Kernel Design
With a minimum of 50 cores, Intel’s Xeon Phi is a true many-core architecture. Featuring fairly powerful cores, two levels of caches, and a very fast interconnection, the Xeon Phi is able to achieve theoretical peak of 1000 GFLOPs and over 240 GB/s. These numbers, as well as its flexibility – it can be used […]