Posts
May, 15
Multi-GPGPU Cellular Automata Simulations using OpenACC
The Frisch-Hasslacher-Pomeau (FHP) model is a lattice gas cellular automaton designed to simulate fluid flows using the exact, purely Boolean arithmetic, without any round-off error. Here we investigate the problem of its efficient porting to clusters of Fermi-class graphic processing units. To this end two multi-GPU implementations were developed and examined: one using the NVIDIA […]
May, 15
Real-time Image Processing on Low Cost Embedded Computers
In 2012 a federal mandate was imposed that required the FAA to integrate unmanned aerial systems (UAS) into the national airspace (NAS) by 2015 for civilian and commercial use. A significant driver for the increasing popularity of these systems is the rise in open hardware and open software solutions which allow hobbyists to build small […]
May, 15
Parallelization of Shape Diameter Function Computation using OpenCL
Shape Diameter Function (SDF) is a scalar function that expresses a measure of the diameter of the object’s volume in the neighborhood of each point on the surface on an input mesh. It is fundamental in many applications in computer graphics used for consistent mesh partitioning and skeletonization. The algorithm sends several rays inside a […]
May, 15
Performance Optimization of GPU ELF-Codes
GPUs (Graphic Processing Units) are of interest for their favorable ratio GF/s/price. Compared to the beginning – early 1980’s – nowadays GPU architectures are more similar to general purpose architectures but with (much) larger numbers of cores – the GF100 architecture released by NVIDIA in 2009-2010, for example, has a true hardware cache hierarchy, a […]
May, 15
Optimized Composition: Generating Efficient Code for Heterogeneous Systems from Multi-Variant Components, Skeletons and Containers
In this survey paper, we review recent work on frameworks for the high-level, portable programming of heterogeneous multi-/manycore systems (especially, GPU-based systems) using high-level constructs such as annotated user-level software components, skeletons (i.e., predefined generic components) and containers, and discuss the optimization problems that need to be considered in selecting among multiple implementation variants, generating […]
May, 14
Preliminary results of autotuning GEMM kernels for the NVIDIA Kepler architecture-GeForce GTX 680
Kepler is the newest GPU architecture from NVIDIA, and the GTX 680 is the first commercially available graphics card based on that architecture. Matrix multiplication is a canonical computational kernel, and often the main target of initial optimization efforts for a new chip. This article presents preliminary results of automatically tuning matrix multiplication kernels for […]
May, 14
Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method
In this paper, the adaptability of the neutron diffusion numerical algorithm on GPUs was studied, and a GPUaccelerated multi-group 3D neutron diffusion code based on finite difference method was developed. The IAEA 3D PWR benchmark problem was calculated in the numerical test. The results demonstrate both high efficiency and adequate accuracy of the GPU implementation […]
May, 14
Evaluating the Power of GPU Acceleration for IDW Interpolation Algorithm
We first present two GPU implementations of the standard Inverse Distance Weighting (IDW) interpolation algorithm, the tiled version that takes advantage of shared memory and the CDP version that is implemented using CUDA Dynamic Parallelism (CDP). Then we evaluate the power of GPU acceleration for IDW interpolation algorithm by comparing the performance of CPU implementation […]
May, 14
Build and Travel KD-Tree with CUDA
Ray tracing is an important and widely used tool in computer graphic. Entertainment and game industry have already benefit a lot from ray tracing. However, designers and end-users are forced to use off-line ray tracing tools for a long time due to the high computation load. In ray tracing, most of the computation is concentrated […]
May, 14
Efficient Energyminimization in Finite-Difference Micromagnetics: Speeding up Hysteresis Computations
We implement an efficient energy-minimization algorithm for finite-difference micromagnetics that proofs especially usefull for the computation of hysteresis loops. Compared to results obtained by time integration of the Landau-Lifshitz-Gilbert equation, a speedup of up to two orders of magnitude is gained. The method is implemented in a finite-difference code running on CPUs as well as […]
May, 13
Cluster-Level Tuning of a Shallow Water Equation Solver on the Intel MIC Architecture
The paper demonstrates the optimization of the execution environment of a hybrid OpenMP+MPI computational fluid dynamics code (shallow water equation solver) on a cluster enabled with Intel Xeon Phi coprocessors. The discussion includes: – Controlling the number and affinity of OpenMP threads to optimize access to memory bandwidth; – Tuning the inter-operation of OpenMP and […]
May, 13
Fast Finite Solar Radiation Pressure Model Integration Using OpenGL
By coupling a common approach to vector graphics, OpenGL, high-fidelity solar-radiation pressure (SRP) effects are calculated easily and quickly with the power of graphics processing units (GPUs). For some missions SRP is a significant perturbation and a consideration wherein a simplified plate model does not suffice. OpenGL is a set of commands that interact with […]