Posts
May, 20
Parallel Approaches to Edit Distance and Approximate String Matching
In this paper, we explore approaches to parallelizing the edit distance problem and the related approximate string matching problem. The edit distance is a measure of the number of individual character insertions, deletions, and substitutions requried to transform one string into another string. In the canonical dynamic programming solution to the edit distance, a chain […]
May, 20
A Step towards Energy Efficient Computing: Redesigning A Hydrodynamic Application on CPU-GPU
Power and energy consumption are becoming an increasing concern in high performance computing. Compared to multi-core CPUs, GPUs have a much better performance per watt. In this paper we discuss efforts to redesign the most computation intensive parts of BLAST, an application that solves the equations for compressible hydrodynamics with high order finite elements, using […]
May, 18
An OpenCL Runtime and Scheduler for Embedded Multicore DSP Parallel Systems
We address the problem that multicore DSP system doesn’t support OpenCL programming. We designed compiler and proposed a runtime framework for TI multicore DSP, by which OpenCL parallel program could take advantage of multicore computing resource. Firstly, we make use of the LLVM and Clang compiler front-end to achieve source-to-source translation and in the next […]
May, 18
StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators
GPUs have largely entered HPC clusters, as shown by the top entries of the latest top500 issue. Exploiting such machines is however very challenging, not only because of combining two separate paradigms, MPI and CUDA or OpenCL, but also because nodes are heterogeneous and thus require careful load balancing within nodes themselves. The current paradigms […]
May, 18
Relativistic hydrodynamics on graphics processing units
Hydrodynamics calculations have been successfully used in studies of the bulk properties of the Quark-Gluon Plasma, particularly of elliptic flow and shear viscosity. However, there are areas (for instance event-by-event simulations for flow fluctuations and higher-order flow harmonics studies) where further advancement is hampered by lack of efficient and precise 3+1D program. This problem can […]
May, 18
Paralleizing AwSpPCA for robust facial recognition using CUDA
This paper was conducted to analyze the performance benefits of parallelizing the Adaptive Weighted Sub-patterned Principle Component Analysis (Aw SP PCA) algorithm, given that the algorithm is implemented so as to retain the accuracy from its serialized version. The serialized execution of this algorithm is analyzed first and then compared against its parallel implementation, both […]
May, 18
Parallel Optical Flow Detection Using CUDA
The intention of this thesis paper is to deploy a parallel implementation of the optical flow detection algorithm known as the Lucas-Kanade algorithm. As an important algorithm in the field of computer vision, it is believed that it holds much promise and shows much potential for benefiting from techniques used to enhance performance through parallel […]
May, 17
Evolutionary Simulation of Life Using CUDA
The idea behind this project was to create a simulation of the evolution of life in CUDA. In this simulation the creatures are individual entities that can interact with the world. Each has its own set of state information and DNA representing it. Through this DNA the creatures evolve via division and mating. The evolution […]
May, 17
Investigating the Impact of Data Parallelism and GPU Technology on Computer Gaming
According to the current design trends, multithreaded multicore processors will be ubiquitous in every device. In computer gaming, chip-makers are adding more cores to fulfill the next generation performance requirements. A game engine has many ‘tasks’ and data parallelism is an important technique for concurrent execution of these tasks. However, effective implementation of multithreaded computer […]
May, 17
Fine-Grained Parallel Incomplete LU Factorization
This paper presents a new fine-grained parallel algorithm for computing an incomplete LU factorization. All nonzeros in the incomplete factors can be computed in parallel and asynchronously, using one or more sweeps that iteratively improve the accuracy of the factorization. Unlike existing parallel algorithms, the new algorithm does not depend on reordering the matrix. Numerical […]
May, 17
Hierarchical Transparent Programming for Heterogeneous Computing
Parallel computing and the development of parallel programs is a way to reduce the time of the program execution. During many years, sequential optimization was designed without thinking about parallel tasks. Currently, multi-core devices have arrived, making code parallelization more important. The parallel computing is closely related with both hardware and software point of view, […]
May, 17
Heterogeneity-aware Fault Tolerance using a Self-Organizing Runtime System
Due to the diversity and implicit redundancy in terms of processing units and compute kernels, off-the-shelf heterogeneous systems offer the opportunity to detect and tolerate faults during task execution in hardware as well as in software. To automatically leverage this diversity, we introduce an extension of an online-learning runtime system that combines the benefits of […]