Posts
Aug, 26
Optimal Control Problem and Power-Efficient Medical Image Processing Using Puma
As a starting point of this paper we present a problem from mammographic image processing. We show how it can be formulated as an optimal control problem for PDEs and illustrate that it leads to penalty terms which are non-standard in the theory of optimal control of PDEs. To solve this control problem we use […]
Aug, 26
Evaluation of P-Scheme/G Algorithm for Solving Recurrence Equations
A parallel algorithm called P-scheme/G is proposed for solving recurrence equations on GPGPU systems. This is based on P-scheme algorithm that has been originally developed for distributed memory multicomputers. In order to achieve a high performance computation on GPGPU systems, our method alleviates branch divergences by reducing the stride data accesses. We also illustrate the […]
Aug, 24
SOCL: An OpenCL Implementation with Automatic Multi-Device Adaptation Support
To fully tap into the potential of today’s heterogeneous machines, offloading parts of an application on accelerators is not sufficient. The real challenge is to build systems where the application would permanently spread across the entire machine, that is, where parallel tasks would be dynamically scheduled over the full set of available processing units. In […]
Aug, 24
Performance Optimization of Vision Apps on Mobile Application Processor
Optimizing performance of compute-intensive vision apps running on mobile application processor (AP) is critical to satisfactory experience for smartphone and tablet users. Most existing vision algorithms were primarily designed and implemented for desktop and server platforms. Porting them to a mobile platform without adapting the algorithms to account for the platform’s limitations would cause serious […]
Aug, 24
OpenCL Programming Guide for Mac
OpenCL (Open Computing Language) is an open standard for cross-platform, programming of modern highly-parallel processor architectures. Introduced withOS X v10.6,OpenCL consists of a C99-based programming language designed for parallelism, a powerful scheduling API, and a flexible runtime that executes kernels on the CPU or GPU. OpenCL lets your application harness the computing power of these […]
Aug, 24
Analysis-Driven Design of Parallel Floating-Point Matrix Multiplication for Implementation in Reconfigurable Logic
The objective of this research is to design an efficient and flexible implementation of parallel matrix multiplication for FPGA devices by analyzing the computation and studying its design space. In order to adapt to the FPGA platform, the design employs blocking and parallelization. Blocked matrix multiplication enables processing arbitrarily large matrices using limited memory capacity, […]
Aug, 24
Reverse Computation for Rollback-based Fault Tolerance in Large Parallel Systems: Evaluating the Potential Gains and Systems Effects
Reverse computation is presented here as an important future direction in addressing the challenge of fault tolerant execution on very large cluster platforms for parallel computing. As the scale of parallel jobs increases, traditional checkpointing approaches suffer scalability problems ranging from computational slowdowns to high congestion at the persistent stores for checkpoints. Reverse computation can […]
Aug, 23
A memory access model for highly-threaded many-core architectures
A number of highly-threaded, many-core architectures hide memory-access latency by low-overhead context switching among a large number of threads. The speedup of a program on these machines depends on how well the latency is hidden. If the number of threads were infinite, theoretically, these machines could provide the performance predicted by the PRAM analysis of […]
Aug, 23
A Unified Framework for Multi-Sensor HDR Video Reconstruction
One of the most successful approaches to modern high quality HDR-video capture is to use camera setups with multiple sensors imaging the scene through a common optical system. However, such systems pose several challenges for HDR reconstruction algorithms. Previous reconstruction techniques have considered debayering, denoising, resampling (alignment) and exposure fusion as separate problems. In contrast, […]
Aug, 23
Implementation of Kirchhoff prestack depth migration on GPU
The massively parallel nature of Graphics Processing Units has made them an attractive platform for some computationally intensive algorithms. This article presents a method to run 3D Kirchhoff prestack depth migration on GPU-based clusters. Compared to a CPU only version of the same algorithm, the new approach delivers a significantly greater efficiency. An actual production […]
Aug, 23
Tight Binding Molecular Dynamics on CPU and GPU clusters
The aim of this dCSE project was to improve the TBE code which is based on the tight binding model with self consistent multipole charge transfer. Given an appropriate parameterisation, the code is general and can be used to simulate a wide variety of systems and phenomena such as bond breaking, charge and magnetic polarisation. […]
Aug, 23
Estimation of Skin Optical Parameters for Real-Time Hyperspectral Imaging Applications using GPGPU Parallel Computing
Hyperspectral imaging with a high spatial and spectral resolution can be used to analyze materials using spectroscopic methods. This can be applied on skin as a general purpose real-time diagnostic tool. Light transport models, like the diffusion model, can describe the light propagation in tissue before the light is captured by the hyperspectral camera. The […]