7922

Posts

Jul, 3

On the Use of GPUs in Realizing Cost-Effective Distributed RAID

The exponential growth in user and application data entails new means for providing fault tolerance and protection against data loss. High Performance Computing (HPC) storage systems, which are at the forefront of handling the data deluge, typically employ hardware RAID at the backend. However, such solutions are costly, do not ensure end-to-end data integrity, and […]
Jul, 2

kANN on the GPU with Shifted Sorting

We describe the implementation of a simple method for finding k approximate nearest neighbors (ANNs) on the GPU. While the performance of most ANN algorithms depends heavily on the distributions of the data and query points, our approach has a very regular data access pattern. It performs as well as state of the art methods […]
Jul, 2

Acceleration of bilateral filtering algorithm for manycore and multicore architectures

This work explores multicore and manycore acceleration for the embarrassingly parallel, compute-intensive bilateral filtering kernel. For manycore architectures, we have created a pair-symmetric algorithm to avoid redundant calculations. For multicore architectures, we improve the algorithm by use of low level single instruction multiple data (SIMD) parallelism across multiple threads. We propose architecture specific optimizations, such […]
Jul, 2

Deformation of skeleton based implicit objects

In this paper we present a precise contact modeling environment for skeleton based implicit objects. To render the scene composed of these implicit objects, we have implemented the state-of-the-art raycasting algorithm, called marching points, on GPU using CUDA. Further, we introduce how to interactively deform the implicit objects when they collide. To achieve this we […]
Jul, 2

Halo Gathering Scalability for Large Scale Multi-dimensional Sznajd Opinion Models Using Data Parallelism with GPUs

The Sznajd model of opinion formation exhibits complex phase transitional and growth behaviour and can be studied with numerical simulations on a number of different network structures. Large system sizes and detailed statistical sampling of the model both require data-parallel computing to accelerate simulation performance. Data structures and computational performance issues are reported for simulations […]
Jul, 2

Computationally Efficient Algorithms for Evaluation of Statistical Descriptors

Homogenization methods are becoming the most popular approach to modelling of heterogeneous materials. The main principle is to represent the heterogeneous microstructure with an equivalent homogeneous material. When dealing with the complex random microstructures, the unit cell representing exactly periodic morphology needs to be replaced by a statistically equivalent periodic unit cell (SEPUC) preserving the […]
Jul, 2

API-Compiling for Image Hardware Accelerators

We present an API-based compilation strategy to optimize image applications, developed using a high level image processing library, onto three different image processing hardware accelerators. The library API provides the semantics of the image computations. The three image accelerator targets are quite distinct: the first one uses a vector architecture; the second one presents a […]
Jul, 2

Parallelization Strategies of the Canny Edge Detector for Multi-core CPUs and Many-core GPUs

In this paper we study two parallelization strategies (loop-level parallelism and domain decomposition), and we investigate their impact in terms of performance and scalability on two different parallel architectures. As a test application, we use the Canny Edge Detector due to its wide range of parallelization opportunities, and its frequent use in computer vision applications. […]
Jul, 1

The Fat-Link Computation On Large GPU Clusters for Lattice QCD

Graphics Processing Units (GPU) are becoming increasingly popular in high performance computing due to their high performance, high power ef?ciency and low cost. In this paper, we present results of an effort to implement the fatlink computation – an important component of many lattice quantum chromodynamics (LQCD) calculations – on GPU clusters using the QUDA […]
Jul, 1

Fault Tree Analysis Speed-up with GPU Parallel Computing

The reliability analysis of critical systems can be performed using fault tree analysis. One of the common approaches used for fault tree analysis is Monte Carlo simulation. The purpose of this paper is therefore to show an algorithm to speed up Monte Carlo simulation for analyzing fault tree with parallel computing in GPU. To this […]
Jul, 1

CUDA-accelerated Hierarchical K-means

In 2011, more than 350 billion photos are generated in a single year. Thus, it is indispensable to use statistic tools for managing data, such as clustering. K-Means is one of the most used clustering methods because it is easy to implement. However, when the number of clusters grows larger, the speed of K-Means become […]
Jun, 30

A Scheduling Framework for a Heterogeneous Parallel Architecture

Scheduling on heterogeneous parallel and distributed computing environment has been studied for decades. Based on different assumptions, researchers have proposed several algorithms and heuristics aiming to improve the performance of parallel applications. Most of these works focus on clusters of CPUs or grid-based environments where heterogeneity is created by processors and networks of varying speeds. […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: