Posts
May, 16
C++ on GPUs Using OpenACC and the PGI Accelerator Compilers, webinar
The fastest supercomputers and clusters use a 64-bit host processor with one or more accelerators per node, most commonly GPUs. These compute accelerators exploit a high degree of parallelism to maximize performance and power efficiency. There are several challenges to effective and productive use of accelerators, the most important of which are managing data movement […]
May, 16
Using GPUs to Accelerate Orthorectification, Atmospheric Correction, and Transformations for Big Data, webinar
Significant improvements in speeds for imagery orthorectification, atmospheric correction, and image transformations like Independent Components Analysis (ICA) have been achieved using GPU-based implementations. Additional optimizations, when factored in with GPU processing capabilities, can provide 50x – 100x reduction in the time required to process large imagery. Exelis Visual Information Solutions (VIS) has implemented a CUDA-based […]
May, 16
Scaling Coupled Climate Models to Exascale: OpenACC-enabled ECEarth3 Earth System Model
Climate change due to increasing anthropogenic greenhouse gases and land surface change is currently one of the most relevant environmental concerns. It threatens ecosystems and human societies. However, its impact on the economy and our living standards depends largely on our ability to anticipate its effects and take appropriate action. Earth System Models (ESMs), such […]
May, 16
Porting NAHUJ to CUDA
This white-paper reports on an enabling effort that involves porting a legacy 2D fluid dynamics Fortran code to NVIDIA GPUs. Given the complexity of both code and underlying (custom) numerical method, the natural choice was to use NVIDIA CUDA C to achieve the best possible performance. We achieved over 4.5x speed-up on a single K20 […]
May, 16
Enabling CP2K Application for Exascale Computing with Accelerators using OpenACC and OpenCL
CP2K is an application for atomistic and molecular simulation and, with its excellent scalability, is particularly important with regards to use on future exascale systems. The code is well parallelized using MPI and hybrid MPI/OpenMP, typically scaling well to ~1 core per atom in the system. The research on CP2K done within PRACE-1IP stated that […]
May, 16
Hybrid Use of OmpSs for a Shock Hydrodynamics Proxy Application
The LULESH proxy application models the behavior of the ALE3D multi-physics code with an explicit shock hydrodynamics problem, and is made in order to evaluate interactions between programming models and architectures, using a representative code significantly less complex than the application it models. As identified in the PRACE deliverable D7.2.1 [1], the OmpSs programming model […]
May, 16
A Straightforward Preprocessing Approach for Accelerating Convex Hull Computations on the GPU
An effective strategy for accelerating the calculation of convex hulls for point sets is to filter the input points by discarding interior points. In this paper, we present such a straightforward and efficient preprocessing approach by exploiting the GPU. The basic idea behind our approach is to discard the points that locate inside a convex […]
May, 15
Multi-GPGPU Cellular Automata Simulations using OpenACC
The Frisch-Hasslacher-Pomeau (FHP) model is a lattice gas cellular automaton designed to simulate fluid flows using the exact, purely Boolean arithmetic, without any round-off error. Here we investigate the problem of its efficient porting to clusters of Fermi-class graphic processing units. To this end two multi-GPU implementations were developed and examined: one using the NVIDIA […]
May, 15
Real-time Image Processing on Low Cost Embedded Computers
In 2012 a federal mandate was imposed that required the FAA to integrate unmanned aerial systems (UAS) into the national airspace (NAS) by 2015 for civilian and commercial use. A significant driver for the increasing popularity of these systems is the rise in open hardware and open software solutions which allow hobbyists to build small […]
May, 15
Parallelization of Shape Diameter Function Computation using OpenCL
Shape Diameter Function (SDF) is a scalar function that expresses a measure of the diameter of the object’s volume in the neighborhood of each point on the surface on an input mesh. It is fundamental in many applications in computer graphics used for consistent mesh partitioning and skeletonization. The algorithm sends several rays inside a […]
May, 15
Performance Optimization of GPU ELF-Codes
GPUs (Graphic Processing Units) are of interest for their favorable ratio GF/s/price. Compared to the beginning – early 1980’s – nowadays GPU architectures are more similar to general purpose architectures but with (much) larger numbers of cores – the GF100 architecture released by NVIDIA in 2009-2010, for example, has a true hardware cache hierarchy, a […]
May, 15
Optimized Composition: Generating Efficient Code for Heterogeneous Systems from Multi-Variant Components, Skeletons and Containers
In this survey paper, we review recent work on frameworks for the high-level, portable programming of heterogeneous multi-/manycore systems (especially, GPU-based systems) using high-level constructs such as annotated user-level software components, skeletons (i.e., predefined generic components) and containers, and discuss the optimization problems that need to be considered in selecting among multiple implementation variants, generating […]