Posts
Oct, 20
Massively parallel read mapping on GPUs with the q-group index and PEANUT
We present the q-group index, a novel data structure for read mapping tailored towards graphics processing units (GPUs) with a small memory footprint and efficient parallel algorithms for querying and building. On top of the q-group index we introduce PEANUT, a highly parallel GPU-based read mapper. PEANUT provides the possibility to output both the best […]
Oct, 20
Heterogeneous computing with an algorithmic skeleton framework
The Graphics Processing Unit (GPU) is present in almost every modern day personal computer. Despite its specific purpose design, they have been increasingly used for general computations with very good results. Hence, there is a growing effort from the community to seamlessly integrate this kind of devices in everyday computing. However, to fully exploit the […]
Oct, 20
Fast-Fourier-Transform-Based Electrical Noise Measurements
We have shown how the Fourier spectrum and the power spectral density can be estimated in concrete measurements. Moreover, we have derived spectral leakage, which is a systematic error in spectrum computation. The Nyquist-Shannon sampling theorem and aliasing have been discussed. Furthermore, we have implemented a spectrum analyzer using a combination of LabView, GPU computing […]
Oct, 20
High-Dimensional Adaptive Particle Swarm Optimization on Heterogeneous Systems
Much work has recently been reported in parallel GPU-based particle swarm optimization (PSO). Motivated by the encouraging results of these investigations, while also recognizing the limitations of GPU-based methods for big problems using a large amount of data, this paper explores the efficacy of employing other types of parallel hardware for PSO. Most commodity systems […]
Oct, 18
A Review of CUDA, MapReduce, and Pthreads Parallel Computing Models
The advent of high performance computing (HPC) and graphics processing units (GPU), present an enormous computation resource for Large data transactions (big data) that require parallel processing for robust and prompt data analysis. While a number of HPC frameworks have been proposed, parallel programming models present a number of challenges, for instance, how to fully […]
Oct, 18
StreamWorks: An Energy-efficient Embedded Co-processor for Stream Computing
Stream processing has emerged as an important model of computation especially in the context of multimedia and communication sub-systems of embedded System-on-Chip (SoC) architectures. The dataflow nature of streaming applications allows them to be most naturally expressed as a set of kernels iteratively operating on continuous streams of data. The kernels are computationally intensive and […]
Oct, 18
Hybrid CPU-GPU Implementation of Tracking-Learning-Detection Algorithm
Tracking objects in a video stream is an important problem in robot learning (learning an object’s visual features from different perspectives as it moves, rotates, scales, and is subjected to some morphological changes such as erosion), defense, public security and many other various domains. In this thesis, we focus on a recently proposed tracking framework […]
Oct, 18
Cholla : A New Massively-Parallel Hydrodynamics Code For Astrophysical Simulation
We present Cholla (Computational Hydrodynamics On ParaLLel Architectures), a new three-dimensional hydrodynamics code that harnesses the power of graphics processing units (GPUs) to accelerate astrophysical simulations. Cholla models the Euler equations on a static mesh using state-of-the-art techniques, including the unsplit Corner Transport Upwind (CTU) algorithm, a variety of exact and approximate Riemann solvers, and […]
Oct, 18
Constructing Long Short-Term Memory based Deep Recurrent Neural Networks for Large Vocabulary Speech Recognition
Long short-term memory (LSTM) based acoustic modeling methods have recently been shown to give state-of-the-art performance on some speech recognition tasks. To achieve a further performance improvement, in this research, deep extensions on LSTM are investigated considering that deep hierarchical model has turned out to be more efficient than a shallow one. Motivated by previous […]
Oct, 16
The Distribution of OpenCL Kernel Execution Across Multiple Devices
Many computer systems now include both CPUs and programmable GPUs. OpenCL, a new programming framework, can program individual CPUs or GPUs; however, distributing a problem across multiple devices is more difficult. This thesis contributes three OpenCL runtimes that automatically distribute a problem across multiple devices: DualCL and m2sOpenCL, which distribute tasks across a single system’s […]
Oct, 16
OpenCL Implementation of Montgomery Multiplication on FPGA
Galois Field arithmetic has been used very frequently in popular security and error-correction applications. Montgomery multiplication is among the suitable methods used for accelerating modular multiplication, which is the most time consuming basic arithmetic operation. Montgomery multiplication is also suitable to be implemented in parallel. OpenCL, which is a portable, heterogeneous and parallel programming framework, […]
Oct, 16
Parallel Programming and Compressed Material Data for an Eulerian Code
We describe the problem of iterating over mesh zones and iterating over material data within a zone, in the context of relatively new compute architectures. We present an example for how this can be done in a way that is portable across parallel programming environments and can be made to perform well. We offer a […]