The Explicit Finite Element Method is a powerful tool in nonlinear dynamic finite element analysis. Recent major developments in computational devices, in particular, General Purpose Graphical Processing Units (GPGPU’s) now make it possible to increase the performance of the explicit FEM. This dissertation investigates existing explicit finite element method algorithms which are then redesigned for […]

January 22, 2016 by hgpu

There is a clear trend nowadays to use heterogeneous high-performance computers, as they offer considerably greater computing power than homogeneous CPU systems. Extending traditional CPU systems with specialized units (accelerators such as GPGPUs) has become a revolution in the HPC world. Both the traditional performance-per-Watt and the performance-per-Euro ratios have been increased with the use […]

January 4, 2016 by hgpu

In this article we discuss our implementation of a polyphase filter for real-time data processing in radio astronomy. We describe in detail our implementation of the polyphase filter algorithm and its behaviour on three generations of NVIDIA GPU cards, on dual Intel Xeon CPUs and the Intel Xeon Phi (Knights Corner) platforms. All of our […]

November 12, 2015 by hgpu

In this dissertation, the hardware and API architectures of GPUs are investigated, and the corresponding acceleration techniques are applied on the traditional frequency domain finite element method (FEM), the element-level time-domain methods, and the nonlinear discontinuous Galerkin method. First, the assembly and the solution phases of the FEM are parallelized and mapped onto the granular […]

October 31, 2015 by hgpu

EMMA is a cosmological simulation code aimed at investigating the reionization epoch. It handles simultaneously collisionless and gas dynamics, as well as radiative transfer physics using a moment-based description with the M1 approximation. Field quantities are stored and computed on an adaptive 3D mesh and the spatial resolution can be dynamically modified based on physically-motivated […]

September 3, 2015 by hgpu

Modern high performance computing environments are composed of networks of compute nodes that often contain a variety of heterogeneous compute resources, such as multicore-CPUs, GPUs, and coprocessors. One challenge faced by domain scientists is how to efficiently use all these distributed, heterogeneous resources. In order to use the GPUs effectively, the workload parallelism needs to […]

August 5, 2015 by hgpu

PIC methods are one of the most used methods in plasma simulations. We present a comprehensible evaluation of the PIC code performance on four current parallel platforms: IBM PowerPC, Intel Nehalem (SMP), Intel Sandy Bridge (SMP) and ARM GPU. The behavior of computational algorithms and data structures are analyzed to deduce which code optimizations will […]

July 29, 2015 by hgpu

The Pierre Auger Observatory is the currently largest experiment dedicated to unveil the nature and origin of the highest energetic cosmic rays. The software framework ‘Offline’ has been developed by the Pierre Auger Collaboration for joint analysis of data from different independent detector systems used in one observatory. While reconstruction modules are specific to the […]

July 29, 2015 by hgpu

Many eigenvalue and eigenvector algorithms begin with reducing the input matrix into a tridiagonal form. A tridiagonal matrix is a matrix that has non-zero elements only on its main diagonal, and the two diagonals directly adjacent to it. Reducing a matrix to a tridiagonal form is an iterative process which uses Jacobi rotations to reduce […]

April 4, 2015 by hgpu

This thesis, entitled "High Performance Computing for solving large sparse systems. Optical Diffraction Tomography as a case of study" investigates the computational issues related to the resolution of linear systems of equations which come from the discretization of physical models described by means of Partial Differential Equations (PDEs). These physical models are conceived for the […]

March 30, 2015 by hgpu

We develop an efficient parallel algorithm for answering shortest-path queries in planar graphs and implement it on a multi-node CPU/GPU clusters. The algorithm uses a divide-and-conquer approach for decomposing the input graph into small and roughly equal subgraphs and constructs a distributed data structure containing shortest distances within each of those subgraphs and between their […]

March 28, 2015 by hgpu

This paper describes recent progress towards porting a Unified Flow Solver (UFS) to heterogeneous parallel computing. UFS is an adaptive kinetic-fluid simulation tool, which combines Adaptive Mesh Refinement (AMR) with automatic cell-by-cell selection of kinetic or fluid solvers based on continuum breakdown criteria. The main challenge of porting UFS to graphics processing units (GPUs) comes […]

March 3, 2015 by hgpu