Posts
Jul, 2
Acceleration of bilateral filtering algorithm for manycore and multicore architectures
This work explores multicore and manycore acceleration for the embarrassingly parallel, compute-intensive bilateral filtering kernel. For manycore architectures, we have created a pair-symmetric algorithm to avoid redundant calculations. For multicore architectures, we improve the algorithm by use of low level single instruction multiple data (SIMD) parallelism across multiple threads. We propose architecture specific optimizations, such […]
Jul, 2
Deformation of skeleton based implicit objects
In this paper we present a precise contact modeling environment for skeleton based implicit objects. To render the scene composed of these implicit objects, we have implemented the state-of-the-art raycasting algorithm, called marching points, on GPU using CUDA. Further, we introduce how to interactively deform the implicit objects when they collide. To achieve this we […]
Jul, 2
Halo Gathering Scalability for Large Scale Multi-dimensional Sznajd Opinion Models Using Data Parallelism with GPUs
The Sznajd model of opinion formation exhibits complex phase transitional and growth behaviour and can be studied with numerical simulations on a number of different network structures. Large system sizes and detailed statistical sampling of the model both require data-parallel computing to accelerate simulation performance. Data structures and computational performance issues are reported for simulations […]
Jul, 2
Computationally Efficient Algorithms for Evaluation of Statistical Descriptors
Homogenization methods are becoming the most popular approach to modelling of heterogeneous materials. The main principle is to represent the heterogeneous microstructure with an equivalent homogeneous material. When dealing with the complex random microstructures, the unit cell representing exactly periodic morphology needs to be replaced by a statistically equivalent periodic unit cell (SEPUC) preserving the […]
Jul, 2
API-Compiling for Image Hardware Accelerators
We present an API-based compilation strategy to optimize image applications, developed using a high level image processing library, onto three different image processing hardware accelerators. The library API provides the semantics of the image computations. The three image accelerator targets are quite distinct: the first one uses a vector architecture; the second one presents a […]
Jul, 2
Parallelization Strategies of the Canny Edge Detector for Multi-core CPUs and Many-core GPUs
In this paper we study two parallelization strategies (loop-level parallelism and domain decomposition), and we investigate their impact in terms of performance and scalability on two different parallel architectures. As a test application, we use the Canny Edge Detector due to its wide range of parallelization opportunities, and its frequent use in computer vision applications. […]
Jul, 1
The Fat-Link Computation On Large GPU Clusters for Lattice QCD
Graphics Processing Units (GPU) are becoming increasingly popular in high performance computing due to their high performance, high power ef?ciency and low cost. In this paper, we present results of an effort to implement the fatlink computation – an important component of many lattice quantum chromodynamics (LQCD) calculations – on GPU clusters using the QUDA […]
Jul, 1
Fault Tree Analysis Speed-up with GPU Parallel Computing
The reliability analysis of critical systems can be performed using fault tree analysis. One of the common approaches used for fault tree analysis is Monte Carlo simulation. The purpose of this paper is therefore to show an algorithm to speed up Monte Carlo simulation for analyzing fault tree with parallel computing in GPU. To this […]
Jul, 1
CUDA-accelerated Hierarchical K-means
In 2011, more than 350 billion photos are generated in a single year. Thus, it is indispensable to use statistic tools for managing data, such as clustering. K-Means is one of the most used clustering methods because it is easy to implement. However, when the number of clusters grows larger, the speed of K-Means become […]
Jun, 30
A Scheduling Framework for a Heterogeneous Parallel Architecture
Scheduling on heterogeneous parallel and distributed computing environment has been studied for decades. Based on different assumptions, researchers have proposed several algorithms and heuristics aiming to improve the performance of parallel applications. Most of these works focus on clusters of CPUs or grid-based environments where heterogeneity is created by processors and networks of varying speeds. […]
Jun, 30
High-performance blob-based iterative three-dimensional reconstruction in electron tomography using multi-GPUs
BACKGROUND: Three-dimensional (3D) reconstruction in electron tomography (ET) has emerged as a leading technique to elucidate the molecular structures of complex biological specimens. Blob-based iterative methods are advantageous reconstruction methods for 3D reconstruction in ET, but demand huge computational costs. Multiple graphic processing units (multi-GPUs) offer an affordable platform to meet these demands. However, a […]
Jun, 30
Accelerating large-scale protein structure alignments with graphics processing units
BACKGROUND: Large-scale protein structure alignment, an indispensable tool to structural bioinformatics, poses a tremendous challenge on computational resources. To ensure structure alignment accuracy and efficiency, efforts have been made to parallelize traditional alignment algorithms in grid environments. However, these solutions are costly and of limited accessibility. Others trade alignment quality for speedup by using high-level […]