Posts
Nov, 28
Short-time Fourier transform laser Doppler holography
We report a demonstration of laser Doppler holography at a sustained acquisition rate of 250 Hz on a 1 Megapixel complementary metal-oxide-semiconductor (CMOS) sensor array and image display at 10 Hz frame rate. The holograms are optically acquired in off-axis configuration, with a frequency-shifted reference beam. Wide-field imaging of optical fluctuations in a 250 Hz […]
Nov, 27
Softshell: Dynamic Scheduling on GPUs
In this paper we present Softshell, a novel execution model for devices composed of multiple processing cores operating in a single instruction, multiple data fashion, such as graphics processing units (GPUs). The Softshell model is intuitive and more flexible than the kernel-based adaption of the stream processing model, which is currently the dominant model for […]
Nov, 27
From Parallel Programs to Customized Parallel Processors
The need for fast time to market of new embedded processor-based designs calls for a rapid design methodology of the included processors. The call for such a methodology is even more emphasized in the context of so called soft cores targeted to reconfigurable fabrics where per-design processor customization is commonplace. The C language has been […]
Nov, 27
Optimizing 3D Convolutions for Wavelet Transforms on CPUs with SSE Units and GPUs
Nanosimulations present a big HPC challenge as they present increasing performance demands in heterogeneous execution environments. In this paper, we present our optimization methodology for BigDFT, a nanosimulation software using Density Functional Theory. We explore autotuning possibilities for BigDFT’s 3D convolutions by studying optimization techniques for several architectures. Namely, we focus on processors with vector […]
Nov, 27
A Data-Parallel Algorithmic Modelica Extension for Efficient Execution on Multi-Core Platforms
New multi-core CPU and GPU architectures promise high computational power at a low cost if suitable computational algorithms can be developed. However, parallel programming for such architectures is usually non-portable, low-level and error-prone. To make the computational power of new multi-core architectures more easily available to Modelica modelers, we have developed the ParModelica algorithmic language […]
Nov, 27
Hardware-Accelerated Raycasting: Towards an Effective Brain MRI Visualization
The rapid development in information technology has immensely contributed to the use of modern approaches for visualizing volumetric data. Consequently, medical volume visualization is increasingly attracting attention towards achieving an effective visualization algorithm for medical diagnosis and pre-treatment planning. Previously, research has been addressing implementation of algorithm that can visualize 2-D images into 3-D. Meanwhile, […]
Nov, 26
Energy efficiency vs. performance of the numerical solution of PDEs: an application study on a low-power ARM-based cluster
Power consumption and energy efficiency are becoming critical aspects in the design and operation of large scale HPC facilities, and it is unanimously recognised that future exascale supercomputers will be strongly constrained by their power requirements. At current electricity costs, operating an HPC system over its lifetime can already be on par with the initial […]
Nov, 26
Stochastic Analysis of a Queue Length Model Using a Graphics Processing Unit
Mathematical modeling is an inevitable part of system analysis and design in science and engineering. When a parametric mathematical description is used, the issue of the parameter estimation accuracy arises. Models with uncertain parameter values can be evaluated using various methods and computer simulation is among the most popular in the engineering community. Nevertheless, an […]
Nov, 26
A compiler toolkit for array-based languages targeting CPU/GPU hybrid systems
This paper presents a compiler toolkit that addresses two important emerging challenges: (1) effectively compiling dynamic array-based languages such as MATLAB, Python and R; and (2) effectively utilizing a wide range of rapidly evolving hybrid CPU/GPU architectures. The toolkit provides: a high-level IR specifically designed to express a wide range of arraybased computations and indexing […]
Nov, 26
High Performance Radiation Transport Simulations: Preparing for TITAN
In this paper we describe the Denovo code system. Denovo solves the six-dimensional, steady-state, linear Boltzmann transport equation, of central importance to nuclear technology applications such as reactor core analysis (neutronics), radiation shielding, nuclear forensics and radiation detection. The code features multiple spatial differencing schemes, state-of-the-art linear solvers, the Koch-Baker-Alcouffe (KBA) parallel-wavefront sweep algorithm for […]
Nov, 26
A Customized 3D GPU Poisson Solver for Free BCs
A 3-dimensional GPU Poisson solver is developed for all possible combinations of free and periodic boundary conditions along the three directions. It is benchmarked for various grid sizes and different BCs and a significant performance gain is observed for problems including one or more free BCs. The GPU Poisson solver is also benchmarked against two […]
Nov, 25
PyFAI, a versatile library for azimuthal regrouping
2D area detectors like ccd or pixel detectors have become popular in the last 15 years for diffraction experiments (e.g. for waxs, saxs, single crystal and powder diffraction (xrpd)). These detectors have a large sensitive area of millions of pixels with high spatial resolution. The software package pyFAI has been designed to reduce saxs, waxs […]