Posts
Aug, 26
Fast Object Re-Detection and Localization in Video for Spatio-Temporal Fragment Creation
This paper presents a method for the detection and localization of instances of user-specified objects within a video or a collection of videos. The proposed method is based on the extraction and matching of SURF descriptors in video frames and further incorporates a number of improvements so as to enhance both the detection accuracy and […]
Aug, 26
Estimating the WCET of GPU-Accelerated Applications using Hybrid Analysis
The massive parallelism offered by Graphics Processing Units (GPUs) is now routinely exploited to accelerate computationally intensive tasks in a wide variety of application domains. Efficient GPU programming in languages such as CUDA and OpenCL requires careful application of hand optimisations to exploit parallelism and locality while minimising synchronisation. The effectiveness of such optimisations can […]
Aug, 26
Lattice Boltzmann Simulations of Multiphase Flows
This thesis is a comprehensive account of my experiences implementing the Lattice Boltzmann Method (LBM) for the purpose of simulating multiphase flows relevant to Air Conditioning and Refrigeration Center (ACRC) applications. Other methodologies have been used to simulate multiphase flow including finite volume based Navier-Stokes solvers. These methods have found reasonable success in simulating multiphase […]
Aug, 26
OpenCL programming using Python syntax
We describe ocl, a Python library built on top of pyOpenCL and numpy. It allows programming GPU devices using Python. Python functions which are marked up using the provided decorator, are converted into C99/OpenCL and compiled using the JIT at runtime. This approach lowers the barrier to entry to programming GPU devices since it requires […]
Aug, 26
SystemC simulation on GP-GPUs: CUDA vs. OpenCL
SystemC is a widespread language for developing SoC designs. Unfortunately, most SystemC simulators are based on a strictly sequential scheduler that heavily limits their performance, impacting verification schedules and time-to-market of new designs. Parallelizing SystemC simulation entails a complete re-design of the simulator kernel for the specific target parallel architectures. This paper proposes an automatic […]
Aug, 26
Performance Evaluation of Intel Xeon Phi Coprocessor using XKaapi
This paper presents preliminary performance comparisons of parallel applications developed natively for the Intel Xeon Phi accelerator using three different parallel programming environments and their associated runtime systems. We compare Intel OpenMP, Intel CilkPlus and XKaapi together on the same benchmark suite. Our benchmark suite is composed of two computing kernels: a Fibonacci computation that […]
Aug, 26
Optimisation and Parallelism in Synchronous Digital Circuit Simulators
Digital circuit simulation often requires a large amount of computation, resulting in long run times. We consider several techniques for optimising a brute force synchronous circuit simulator: an algorithm using an event queue that avoids recalculating quiescent parts of the circuit, a marking algorithm that is similar to the event queue but that avoids a […]
Aug, 26
Optimal Control Problem and Power-Efficient Medical Image Processing Using Puma
As a starting point of this paper we present a problem from mammographic image processing. We show how it can be formulated as an optimal control problem for PDEs and illustrate that it leads to penalty terms which are non-standard in the theory of optimal control of PDEs. To solve this control problem we use […]
Aug, 26
Evaluation of P-Scheme/G Algorithm for Solving Recurrence Equations
A parallel algorithm called P-scheme/G is proposed for solving recurrence equations on GPGPU systems. This is based on P-scheme algorithm that has been originally developed for distributed memory multicomputers. In order to achieve a high performance computation on GPGPU systems, our method alleviates branch divergences by reducing the stride data accesses. We also illustrate the […]
Aug, 24
SOCL: An OpenCL Implementation with Automatic Multi-Device Adaptation Support
To fully tap into the potential of today’s heterogeneous machines, offloading parts of an application on accelerators is not sufficient. The real challenge is to build systems where the application would permanently spread across the entire machine, that is, where parallel tasks would be dynamically scheduled over the full set of available processing units. In […]
Aug, 24
Performance Optimization of Vision Apps on Mobile Application Processor
Optimizing performance of compute-intensive vision apps running on mobile application processor (AP) is critical to satisfactory experience for smartphone and tablet users. Most existing vision algorithms were primarily designed and implemented for desktop and server platforms. Porting them to a mobile platform without adapting the algorithms to account for the platform’s limitations would cause serious […]
Aug, 24
OpenCL Programming Guide for Mac
OpenCL (Open Computing Language) is an open standard for cross-platform, programming of modern highly-parallel processor architectures. Introduced withOS X v10.6,OpenCL consists of a C99-based programming language designed for parallelism, a powerful scheduling API, and a flexible runtime that executes kernels on the CPU or GPU. OpenCL lets your application harness the computing power of these […]