Posts
May, 16
Hybrid Use of OmpSs for a Shock Hydrodynamics Proxy Application
The LULESH proxy application models the behavior of the ALE3D multi-physics code with an explicit shock hydrodynamics problem, and is made in order to evaluate interactions between programming models and architectures, using a representative code significantly less complex than the application it models. As identified in the PRACE deliverable D7.2.1 [1], the OmpSs programming model […]
May, 16
A Straightforward Preprocessing Approach for Accelerating Convex Hull Computations on the GPU
An effective strategy for accelerating the calculation of convex hulls for point sets is to filter the input points by discarding interior points. In this paper, we present such a straightforward and efficient preprocessing approach by exploiting the GPU. The basic idea behind our approach is to discard the points that locate inside a convex […]
May, 15
Multi-GPGPU Cellular Automata Simulations using OpenACC
The Frisch-Hasslacher-Pomeau (FHP) model is a lattice gas cellular automaton designed to simulate fluid flows using the exact, purely Boolean arithmetic, without any round-off error. Here we investigate the problem of its efficient porting to clusters of Fermi-class graphic processing units. To this end two multi-GPU implementations were developed and examined: one using the NVIDIA […]
May, 15
Real-time Image Processing on Low Cost Embedded Computers
In 2012 a federal mandate was imposed that required the FAA to integrate unmanned aerial systems (UAS) into the national airspace (NAS) by 2015 for civilian and commercial use. A significant driver for the increasing popularity of these systems is the rise in open hardware and open software solutions which allow hobbyists to build small […]
May, 15
Parallelization of Shape Diameter Function Computation using OpenCL
Shape Diameter Function (SDF) is a scalar function that expresses a measure of the diameter of the object’s volume in the neighborhood of each point on the surface on an input mesh. It is fundamental in many applications in computer graphics used for consistent mesh partitioning and skeletonization. The algorithm sends several rays inside a […]
May, 15
Performance Optimization of GPU ELF-Codes
GPUs (Graphic Processing Units) are of interest for their favorable ratio GF/s/price. Compared to the beginning – early 1980’s – nowadays GPU architectures are more similar to general purpose architectures but with (much) larger numbers of cores – the GF100 architecture released by NVIDIA in 2009-2010, for example, has a true hardware cache hierarchy, a […]
May, 15
Optimized Composition: Generating Efficient Code for Heterogeneous Systems from Multi-Variant Components, Skeletons and Containers
In this survey paper, we review recent work on frameworks for the high-level, portable programming of heterogeneous multi-/manycore systems (especially, GPU-based systems) using high-level constructs such as annotated user-level software components, skeletons (i.e., predefined generic components) and containers, and discuss the optimization problems that need to be considered in selecting among multiple implementation variants, generating […]
May, 14
Preliminary results of autotuning GEMM kernels for the NVIDIA Kepler architecture-GeForce GTX 680
Kepler is the newest GPU architecture from NVIDIA, and the GTX 680 is the first commercially available graphics card based on that architecture. Matrix multiplication is a canonical computational kernel, and often the main target of initial optimization efforts for a new chip. This article presents preliminary results of automatically tuning matrix multiplication kernels for […]
May, 14
Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method
In this paper, the adaptability of the neutron diffusion numerical algorithm on GPUs was studied, and a GPUaccelerated multi-group 3D neutron diffusion code based on finite difference method was developed. The IAEA 3D PWR benchmark problem was calculated in the numerical test. The results demonstrate both high efficiency and adequate accuracy of the GPU implementation […]
May, 14
Evaluating the Power of GPU Acceleration for IDW Interpolation Algorithm
We first present two GPU implementations of the standard Inverse Distance Weighting (IDW) interpolation algorithm, the tiled version that takes advantage of shared memory and the CDP version that is implemented using CUDA Dynamic Parallelism (CDP). Then we evaluate the power of GPU acceleration for IDW interpolation algorithm by comparing the performance of CPU implementation […]
May, 14
Build and Travel KD-Tree with CUDA
Ray tracing is an important and widely used tool in computer graphic. Entertainment and game industry have already benefit a lot from ray tracing. However, designers and end-users are forced to use off-line ray tracing tools for a long time due to the high computation load. In ray tracing, most of the computation is concentrated […]
May, 14
Efficient Energyminimization in Finite-Difference Micromagnetics: Speeding up Hysteresis Computations
We implement an efficient energy-minimization algorithm for finite-difference micromagnetics that proofs especially usefull for the computation of hysteresis loops. Compared to results obtained by time integration of the Landau-Lifshitz-Gilbert equation, a speedup of up to two orders of magnitude is gained. The method is implemented in a finite-difference code running on CPUs as well as […]