Posts
Apr, 25
Comparison of Different Parallel Implementaions of the 2+1-Dimensional KPZ Model and the 3-Dimensional KMC Model
We show that efficient simulations of the Kardar-Parisi-Zhang interface growth in 2 + 1 dimensions and of the 3-dimensional Kinetic Monte Carlo of thermally activated diffusion can be realized both on GPUs and modern CPUs. In this article we present results of different implementations on GPUs using CUDA and OpenCL and also on CPUs using […]
Apr, 25
Paraiso : An Automated Tuning Framework for Explicit Solvers of Partial Differential Equations
We propose Paraiso, a domain specific language embedded in functional programming language Haskell, for automated tuning of explicit solvers of partial differential equations (PDEs) on GPUs as well as multicore CPUs. In Paraiso, one can describe PDE solving algorithms succinctly using tensor equations notation. Hydrodynamic properties, interpolation methods and other building blocks are described in […]
Apr, 24
Real-time video breakup detection for multiple HD video streams on a single GPU
An important task in film and video preservation is the quality assessment of the content to be archived or reused out of the archive. This task, if done manually, is a straining and time consuming process, so it is highly recommended to automate this process as far as possible. In this paper, we show how […]
Apr, 23
Performance Degradation Analysis of GPU Kernels
Hardware accelerators (currently Graphical Processing Units or GPUs) are an important component in many existing high-performance computing solutions [5]. Their growth in variety and usage is expected to skyrocket [1] due to many reasons. First, GPUs offer impressive energy efficiencies [3]. Second, when properly programmed, they yield impressive speedups by allowing programmers to model their […]
Apr, 23
Tree Structured Analysis on GPU Power Study
Graphics Processing Units (GPUs) have emerged as a promising platform for parallel computation. With a large number of processor cores and abundant memory bandwidth, GPUs deliver substantial computation power. While providing high computation performance, a GPU consumes high power and needs sufficient power supplies and cooling systems. It is essential to institute an efficient mechanism […]
Apr, 23
High-Efficient Parallel CAVLC Encoders on Heterogeneous Multicore Architectures
This article presents two high-efficient parallel realizations of the context-based adaptive variable length coding (CAVLC) based on heterogeneous multicore processors. By optimizing the architecture of the CAVLC encoder, three kinds of dependences are eliminated or weaken, including the context-based data dependence, the memory accessing dependence and the control dependence. The CAVLC pipeline is divided into […]
Apr, 23
Parallel Surface Reconstruction for Particle-Based Fluids
This paper presents a novel method that improves the efficiency of high-quality surface reconstructions for particle-based fluids using Marching Cubes. By constructing the scalar field only in a narrow band around the surface, the computational complexity and the memory consumption scale with the fluid surface instead of the volume. Furthermore, a parallel implementation of the […]
Apr, 23
Memory Bandwidth Efficient Two-Dimensional Fast Fourier Transform Algorithm and Implementation for Large Problem Sizes
Prevailing VLSI trends point to a growing gap between the scaling of on-chip processing throughput and off-chip memory bandwidth. An efficient use of memory bandwidth must become a first-class design consideration in order to fully utilize the processing capability of highly concurrent processing platforms like FPGAs. In this paper, we present key aspects of this […]
Apr, 21
Computing Performance Benchmarks among CPU, GPU, and FPGA
In recent years, the world of high performance computing has been developing rapidly. The goal of this project was to conduct computing performance benchmarks on three major computing platforms, CPUs, GPUs, and FPGAs. A total of 66 benchmarks were evaluated. GPUs outperformed the other platforms in terms of execution time. CPUs outperformed in overall execution […]
Apr, 21
Fast Universal Background Model (UBM) Training on GPUs using Compute Unified Device Architecture (CUDA)
Universal Background Modeling (UBM) is an alternative hypothesized modeling that is used extensively in Speaker Verification (SV) systems. Training the background models from large speech data requires a significant amount of memory and computational load. In this paper a parallel implementation of speaker verification system based on Gaussian Mixture Modeling – Universal Background Modeling (GMM […]
Apr, 21
An On-Demand Fast Parallel Pseudo Random Number Generator with Applications
The use of manycore architectures and accelerators, such as GPUs, with good programmability has allowed them to be deployed for vital computational work. The ability to use randomness in computation is known to help in several situations. For such computations to be made possible on a general purpose computer, a source of randomness, or in […]
Apr, 21
Image Convolution Processing: a GPU versus FPGA Comparison
Convolution is one of the most important operators used in image processing. With the constant need to increase the performance in high-end applications and the rise and popularity of parallel architectures, such as GPUs and the ones implemented in FPGAs, comes the necessity to compare these architectures in order to determine which of them performs […]