Posts
Oct, 22
Optimization Techniques for Mapping Algorithms and Applications onto CUDA GPU Platforms and CPU-GPU Heterogeneous Platforms
An emerging trend in processor architecture seems to indicate the doubling of the number of cores per chip every two years with same or decreased clock speed. Of particular interest to this thesis is the class of many-core processors, which are becoming more attractive due to their high performance, low cost, and low power consumption. […]
Oct, 22
Fast Parallel Algorithm for Enumerating All Chordless Cycles in Graphs
Finding chordless cycles is an important theoretical problem in the Graph Theory area. It also can be applied to practical problems such as discover which predators compete for the same food in ecological networks. Motivated by the problem of theoretical interest and also by its significant practical importance, we present in this paper a parallel […]
Oct, 22
3D simulation of complex shading affecting PV systems taking benefit from the power of graphics cards developed for the video game industry
Shading reduces the power output of a photovoltaic (PV) system. The design engineering of PV systems requires modeling and evaluating shading losses. Some PV systems are affected by complex shading scenes whose resulting PV energy losses are very difficult to evaluate with current modeling tools. Several specialized PV design and simulation software include the possibility […]
Oct, 22
Performance Engineering of the Kernel Polynomial Method on Large-Scale CPU-GPU Systems
The Kernel Polynomial Method (KPM) is a well-established scheme in quantum physics and quantum chemistry to determine the eigenvalue density and spectral properties of large sparse matrices. In this work we demonstrate the high optimization potential and feasibility of peta-scale heterogeneous CPU-GPU implementations of the KPM. At the node level we show that it is […]
Oct, 20
A Performance Comparison of Sort and Scan Libraries for GPUs
Sorting and scanning are two fundamental primitives for constructing highly parallel algorithms. A number of libraries now provide implementations of these primitives for GPUs, but there is relatively little information about the performance of these implementations. We benchmark seven libraries for 32-bit integer scan and sort, and sorting 32-bit values by 32-bit integer keys.We show […]
Oct, 20
Massively parallel read mapping on GPUs with the q-group index and PEANUT
We present the q-group index, a novel data structure for read mapping tailored towards graphics processing units (GPUs) with a small memory footprint and efficient parallel algorithms for querying and building. On top of the q-group index we introduce PEANUT, a highly parallel GPU-based read mapper. PEANUT provides the possibility to output both the best […]
Oct, 20
Heterogeneous computing with an algorithmic skeleton framework
The Graphics Processing Unit (GPU) is present in almost every modern day personal computer. Despite its specific purpose design, they have been increasingly used for general computations with very good results. Hence, there is a growing effort from the community to seamlessly integrate this kind of devices in everyday computing. However, to fully exploit the […]
Oct, 20
Fast-Fourier-Transform-Based Electrical Noise Measurements
We have shown how the Fourier spectrum and the power spectral density can be estimated in concrete measurements. Moreover, we have derived spectral leakage, which is a systematic error in spectrum computation. The Nyquist-Shannon sampling theorem and aliasing have been discussed. Furthermore, we have implemented a spectrum analyzer using a combination of LabView, GPU computing […]
Oct, 20
High-Dimensional Adaptive Particle Swarm Optimization on Heterogeneous Systems
Much work has recently been reported in parallel GPU-based particle swarm optimization (PSO). Motivated by the encouraging results of these investigations, while also recognizing the limitations of GPU-based methods for big problems using a large amount of data, this paper explores the efficacy of employing other types of parallel hardware for PSO. Most commodity systems […]
Oct, 18
A Review of CUDA, MapReduce, and Pthreads Parallel Computing Models
The advent of high performance computing (HPC) and graphics processing units (GPU), present an enormous computation resource for Large data transactions (big data) that require parallel processing for robust and prompt data analysis. While a number of HPC frameworks have been proposed, parallel programming models present a number of challenges, for instance, how to fully […]
Oct, 18
StreamWorks: An Energy-efficient Embedded Co-processor for Stream Computing
Stream processing has emerged as an important model of computation especially in the context of multimedia and communication sub-systems of embedded System-on-Chip (SoC) architectures. The dataflow nature of streaming applications allows them to be most naturally expressed as a set of kernels iteratively operating on continuous streams of data. The kernels are computationally intensive and […]
Oct, 18
Hybrid CPU-GPU Implementation of Tracking-Learning-Detection Algorithm
Tracking objects in a video stream is an important problem in robot learning (learning an object’s visual features from different perspectives as it moves, rotates, scales, and is subjected to some morphological changes such as erosion), defense, public security and many other various domains. In this thesis, we focus on a recently proposed tracking framework […]