Posts
Apr, 29
Multi-GPU Implementation of the Minimum Volume Simplex Analysis Algorithm for Hyperspectral Unmixing
Spectral unmixing is an important task in remotely sensed hyperspectral data exploitation. The linear mixture model has been widely used to unmix hyperspectral images by identifying a set of pure spectral signatures, called endmembers, and estimating their respective abundances in each pixel of the scene. Several algorithms have been proposed in the recent literature to […]
Apr, 29
A GPU Accelerated Simulator for CO2 Storage
The goal of this thesis has been to develop a fast simulator for large-scale migration of CO2 in saline aquifers. We have also focused on being able to let the CO2 storage atlas from the Norwegian Petroleum Directorate specify the reservoir properties. In order to meet the demands of simulating on large data sets combined […]
Apr, 29
Piko: A Design Framework for Programmable Graphics Pipelines
We present Piko, a design framework for designing efficient programmable graphics pipelines. Piko is built around managing work granularity in a programmable and flexible manner, allowing programmers to build load-balanced parallel pipeline implementations, to exploit spatial and producer-consumer locality in the pipeline, and to explore tradeoffs between these considerations. Piko programmers describe a pipeline as […]
Apr, 27
On the Use of Remote GPUs and Low-Power Processors for the Acceleration of Scientific Applications
Many current high-performance clusters include one or more GPUs per node in order to dramatically reduce application execution time, but the utilization of these accelerators is usually far below 100%. In this context, remote GPU virtualization can help to reduce acquisition costs as well as the overall energy consumption. In this paper, we investigate the […]
Apr, 27
Efficient Acceleration of Mutual Information Computation for Nonrigid Registration using CUDA
In this paper, we propose an efficient acceleration method for the nonrigid registration of multimodal images that uses a graphics processing unit (GPU). The key contribution of our method is efficient utilization of on-chip memory for both normalized mutual information (NMI) computation and hierarchical B-spline deformation, which compose a well-known registration algorithm. We implement this […]
Apr, 27
Self-Adaptive Multiprecision Preconditioners on Multicore and Manycore Architectures
Based on the premise that preconditioners needed for scientific computing are not only required to be robust in the numerical sense, but also scalable for up to thousands of light-weight cores, we argue that this two-fold goal is achieved for the recently developed self-adaptive multi-elimination preconditioner. For this purpose, we revise the underlying idea and […]
Apr, 27
Scattering Parameters and Surface Normals from Homogeneous Translucent Materials using Photometric Stereo
This paper proposes a novel photometric stereo solution to jointly estimate surface normals and scattering parameters from a globally planar, homogeneous, translucent object. Similar to classic photometric stereo, our method only requires as few as three observations of the translucent object under directional lighting. Naively applying classic photometric stereo results in blurred photometric normals. We […]
Apr, 27
Cellular GPU Models to Euclidean Optimization Problems
The work presented in this PhD studies and proposes cellular computation parallel models able to address different types of NP-hard optimization problems defined in the Euclidean space, and their implementation on the Graphics Processing Unit (GPU) platform. The goal is to allow both dealing with large size problems and provide substantial acceleration factors by massive […]
Apr, 25
On the Parallelization of Integer Polynomial Multiplication
With the advent of hardware accelerator technologies, multi-core processors and GPUs, much effort for taking advantage of those architectures by designing parallel algorithms has been made. To achieve this goal, one needs to consider both algebraic complexity and parallelism, plus making efficient use of memory traffic, cache, and reducing overheads in the implementations. Polynomial multiplication […]
Apr, 25
A new way in few-body scattering calculations: discretized Faddeev equations solved on GPU
A new approach towards very fast and economic few-body scattering calculations is described. The general method is realized on three steps: (i) reformulation of the scattering equations using the convenient analytical form for the channel resolvent operator; (ii) the complete few-body continuum discretization and projection of all operators and wave functions onto the $L_2$ type […]
Apr, 25
Integrating multi-threading and accelerators into DUNE-ISTL
A major challenge in PDE software is the balance between user-level flexibility and performance on heterogeneous hardware. We discuss our ideas on how this challenge can be tackled, exemplarily for the DUNE framework and in particular its linear algebra and solver components. We demonstrate how the former MPI-only implementation is modified to support MPI+[CPU/GPU] threading […]
Apr, 25
One weird trick for parallelizing convolutional neural networks
I present a new way to parallelize the training of convolutional neural networks across multiple GPUs. The method scales significantly better than all alternatives when applied to modern convolutional neural networks.