Posts
Jan, 25
Locality-Aware Work Stealing on Multi-CPU and Multi-GPU Architectures
Most recent HPC platforms have heterogeneous nodes composed of a combination of multi-core CPUs and accelerators, like GPUs. Scheduling on such architectures relies on a static partitioning and cost model. In this paper, we present a locality-aware work stealing scheduler for multi-CPU and multi-GPU architectures, which relies on the XKaapi runtime system. We show performance […]
Jan, 25
Vlasov on GPU (VOG Project)
This work concerns the numerical simulation of the Vlasov-Poisson set of equations using semi- Lagrangian methods on Graphical Processing Units (GPU). To accomplish this goal, modifications to traditional methods had to be implemented. First and foremost, a reformulation of semi-Lagrangian methods is performed, which enables us to rewrite the governing equations as a circulant matrix […]
Jan, 25
A GPU-accelerated Direct-sum Boundary Integral Poisson-Boltzmann Solver
In this paper, we present a GPU-accelerated direct-sum boundary integral method to solve the linear Poisson-Boltzmann (PB) equation. In our method, a well-posed boundary integral formulation is used to ensure the fast convergence of Krylov subspace based linear algebraic solver such as the GMRES. The molecular surfaces are discretized with flat triangles and centroid collocation. […]
Jan, 24
High Performance Lattice Boltzmann Solvers on Massively Parallel Architectures with Applications to Building Aeraulics
With the advent of low-energy buildings, the need for accurate building performance simulations has significantly increased. However, for the time being, the thermo-aeraulic effects are often taken into account through simplified or even empirical models, which fail to provide the expected accuracy. Resorting to computational fluid dynamics seems therefore unavoidable, but the required computational effort […]
Jan, 24
Programmability and Performance Portability Aspects of Heterogeneous Multi-/Manycore Systems
We discuss three complementary approaches that can provide both portability and an increased level of abstraction for the programming of heterogeneous multicore systems. Together, these approaches also support performance portability, as currently investigated in the EU FP7 project PEPPHER. In particular, we consider (1) a library-based approach, here represented by the integration of the SkePU […]
Jan, 24
Hybrid Single/Double Precision Floating-Point Computation on GPU Accelerators for 2-D FDTD
Acceleration of FDTD (Finite-Difference TimeDomain) is very important in computational electromagnetic. We propose a hybrid single/double precision floating-point computation to accelerate FDTD on GPUs. We apply single-precision when the dynamic range of the electromagnetic field is low and double-precision when the dynamic range is high. According to the experimental results, we achieved over 35 times […]
Jan, 24
Developing and Evaluating clOpenCL Applications for Heterogeneous Clusters
In the last few years, the computing systems processing capabilities have increased significantly, changing from single-core to multi-core and even many-core systems. Accompanying this evolution, local networks have also become faster, with multi-gigabit technologies like Infiniband, Myrinet and 10G Ethernet. Parallel/distributed programming tools and standards, like POSIX Threads, OpenMP and MPI, have helped to explore […]
Jan, 24
Performance Study of Satellite Image Processing on Graphics Processors Unit Using CUDA
High resolution satellite images are now widely used for a variety of mapping applications including photogrammetry, GIS data acquisition and visualization. As the spectral and spatial data size of satellite images increases, a greater processing power is needed to process the images. The solution of these problems is parallel systems. Parallel processing techniques have been […]
Jan, 24
GPU-based 3D Wavelet Transform
Wide amount of applications like volumetric medical data compression, video watermarking and video coding use the three-dimensional wavelet transform (3D-DWT) in their algorithms. In this work, we present GPU algorithms, based on both global and shared memory, to compute the 3D-DWT transform on both the GTX280 and the GMT540 platforms. The results obtained show that […]
Jan, 23
Reducing GPU Offload Latency via Fine-Grained CPU-GPU Synchronization
GPUs are seeing increasingly widespread use for general purpose computation due to their excellent performance for highly-parallel, throughput-oriented applications. For many workloads, however, the performance benefits of offloading are hindered by the large and unpredictable overheads of launching GPU kernels and of transferring data between CPU and GPU. This paper proposes and evaluates hardware and […]
Jan, 23
The effects of nutrient chemotaxis on bacterial aggregation patterns with non-linear degenerate cross diffusion
This paper introduces a reaction-diffusion-chemotaxis model for bacterial aggregation patterns on the surface of thin agar plates. It is based on the non-linear degenerate cross diffusion model proposed by Kawasaki et al. (J. of Theor. Biol. 188(2) 1997) and it includes a suitable nutrient chemotactic term compatible with such type of diffusion. High resolution numerical […]
Jan, 23
A survey on various computationally intensive parallel applications in High performance Computing System with OpenCL-MPI
As we are in the development phase of our own super computer, we have identified several applications which are highly computationally intensive applications for a normal desktop computer to achieve the solution. These identified applications are related to multidisciplinary like bio-medical, mathematics, fluid dynamics, genetic algorithms. We are actually identifying the parallel computations involved in […]