Posts
May, 21
A Comparison of Serial & Parallel Particle Filters for Time Series Analysis
This paper discusses the application of parallel programming techniques to the estimation of hidden Markov models via the use of a particle filter. It highlights how the Thrust parallel programming language can be used to implement a particle filter in parallel. The impact of a parallel particle filter on the running times of three different […]
May, 20
targetDP: an Abstraction of Lattice Based Parallelism with Portable Performance
To achieve high performance on modern computers, it is vital to map algorithmic parallelism to that inherent in the hardware. From an application developer’s perspective, it is also important that code can be maintained in a portable manner across a range of hardware. Here we present targetDP, a lightweight programming layer that allows the abstraction […]
May, 20
Multi-GPU Accelerated Parallel Algorithm of Wallis Transformation for Image Enhancement
With the development of satellite remote sensing technology, satellite remote sensing data obtained by the amount will increase rapidly. Consequently, the process of Wallis transformation is faced with such challenges as large data size, high intensity, high computational complexity and large computational quantity, and so on. A fast algorithm and efficient implementation of Wallis filtering […]
May, 20
Exploiting Parallelism in GPUs
Heterogeneous processors with accelerators provide an opportunity to improve performance within a given power budget. Many of these heterogeneous processors contain Graphics Processing Units (GPUs) that can perform graphics and embarrassingly parallel computation orders of magnitude faster than a CPU while using less energy. Beyond these obvious applications for GPUs, a larger variety of applications […]
May, 20
Parallel Approaches to Edit Distance and Approximate String Matching
In this paper, we explore approaches to parallelizing the edit distance problem and the related approximate string matching problem. The edit distance is a measure of the number of individual character insertions, deletions, and substitutions requried to transform one string into another string. In the canonical dynamic programming solution to the edit distance, a chain […]
May, 20
A Step towards Energy Efficient Computing: Redesigning A Hydrodynamic Application on CPU-GPU
Power and energy consumption are becoming an increasing concern in high performance computing. Compared to multi-core CPUs, GPUs have a much better performance per watt. In this paper we discuss efforts to redesign the most computation intensive parts of BLAST, an application that solves the equations for compressible hydrodynamics with high order finite elements, using […]
May, 18
An OpenCL Runtime and Scheduler for Embedded Multicore DSP Parallel Systems
We address the problem that multicore DSP system doesn’t support OpenCL programming. We designed compiler and proposed a runtime framework for TI multicore DSP, by which OpenCL parallel program could take advantage of multicore computing resource. Firstly, we make use of the LLVM and Clang compiler front-end to achieve source-to-source translation and in the next […]
May, 18
StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators
GPUs have largely entered HPC clusters, as shown by the top entries of the latest top500 issue. Exploiting such machines is however very challenging, not only because of combining two separate paradigms, MPI and CUDA or OpenCL, but also because nodes are heterogeneous and thus require careful load balancing within nodes themselves. The current paradigms […]
May, 18
Relativistic hydrodynamics on graphics processing units
Hydrodynamics calculations have been successfully used in studies of the bulk properties of the Quark-Gluon Plasma, particularly of elliptic flow and shear viscosity. However, there are areas (for instance event-by-event simulations for flow fluctuations and higher-order flow harmonics studies) where further advancement is hampered by lack of efficient and precise 3+1D program. This problem can […]
May, 18
Paralleizing AwSpPCA for robust facial recognition using CUDA
This paper was conducted to analyze the performance benefits of parallelizing the Adaptive Weighted Sub-patterned Principle Component Analysis (Aw SP PCA) algorithm, given that the algorithm is implemented so as to retain the accuracy from its serialized version. The serialized execution of this algorithm is analyzed first and then compared against its parallel implementation, both […]
May, 18
Parallel Optical Flow Detection Using CUDA
The intention of this thesis paper is to deploy a parallel implementation of the optical flow detection algorithm known as the Lucas-Kanade algorithm. As an important algorithm in the field of computer vision, it is believed that it holds much promise and shows much potential for benefiting from techniques used to enhance performance through parallel […]
May, 17
Evolutionary Simulation of Life Using CUDA
The idea behind this project was to create a simulation of the evolution of life in CUDA. In this simulation the creatures are individual entities that can interact with the world. Each has its own set of state information and DNA representing it. Through this DNA the creatures evolve via division and mating. The evolution […]