7306

Posts

Feb, 24

A novel sorting algorithm for many-core architectures based on adaptive bitonic sort

Adaptive bitonic sort is a well known merge-based parallel sorting algorithm. It achieves optimal complexity using a complex tree-like data structure called a bitonic tree. Due to this, using adaptive bitonic sort together with other algorithms usually implies converting bitonic trees to arrays and vice versa. This makes adaptive bitonic sort inappropriate in the context […]
Feb, 24

Reuse and Refactoring of GPU Kernels to Design Complex Applications

Developers of GPU kernels, such as FFT, linear solvers, etc, tune their code extensively in order to obtain optimal performance, making efficient use of different resources available on the GPU. Complex applications are composed of several such kernel components. The software engineering community has performed extensive research on componentbased design to build generic and flexible […]
Feb, 24

Stargazer: Automated Regression-Based GPU Design Space Exploration

Graphics processing units (GPUs) are of increasing interest because they offer massive parallelism for high-throughput computing. While GPUs promise high peak performance, their challenge is a less-familiar programming model with more complex and irregular performance trade-offs than traditional CPUs or CMPs. In particular, modest changes in software or hardware characteristics can lead to large or […]
Feb, 24

Collision Detection of Triangle Meshes using GPU

Collision detection in physics engines often use primitives such as spheres and boxes since collisions between these objects are straightforward to compute. More complicated objects can then be modeled using compounds of these simpler primitives. However, in the pursuit of making it easier to construct and simulate complicated objects, triangle meshes are a good alternative […]
Feb, 24

Shredder: GPU-Accelerated Incremental Storage and Computation

Redundancy elimination using data deduplication and incremental data processing has emerged as an important technique to minimize storage and computation requirements in data center computing. In this paper, we present the design, implementation and evaluation of Shredder, a high performance content-based chunking framework for supporting incremental storage and computation systems. Shredder exploits the massively parallel […]
Feb, 23

CUDA Implementation in the EM Scattering of a Three-Layer Canopy

Calculation of the EM scattered fields from a three-layer canopy faces intensive computational burden, when the area becomes large and obviously lames the application of the traditional serial algorithm. With the development of graphics hardware, the Graphics Processing Unit (GPU) can be used to calculate the electromagnetic (EM) scattering problems parallelly. In this paper, the […]
Feb, 23

Speculative Parallelization on GPGPUs

This paper overviews the first speculative parallelization technique for GPUs that can exploit parallelism in loops even in the presence of dynamic irregularities that may give rise to cross-iteration dependences. The execution of a speculatively parallelized loop consists of five phases: scheduling, computation, misspeculation check, result committing, and misspeculation recovery. We perform misspeculation check on […]
Feb, 23

Use of Multiple GPUs on Shared Memory Multiprocessors for Ultrasound Propagation Simulations

This paper outlines our effort to migrate a compute intensive application of ultrasound propagation being developed in Matlab to a cluster computer where each node has seven GPUs. Our goal is to perform realistic simulations in hours and minutes instead of weeks and days. In order to reach this goal we investigate architecture characteristics of […]
Feb, 23

Network Simulator Tools and GPU Parallel Systems

In this paper we discuss the possibilities for parallel implementations of network simulators. Specifically we investigate the options for porting parts of the simulator on GPU in order to utilize its resources and obtain faster simulations. We discuss few issues which are unsuitable for the GPU architecture, and we propose a possible work around for […]
Feb, 23

Virtual Texturing with WebGL

Until recently, achieveing hardware accelerated 3D content on web sites have only been accessible through third party plugins. The new HTML5 standard eliminates this restriction by adding native 3D rendering through the WebGL API. This technology brings established desktop applications online, bridging the gap between software platforms. This thesis investigates how to implement Virtual Texturing […]
Feb, 22

Efficient parallel implementation of the lattice Boltzmann method on large clusters of graphic processing units

Many-core processors, such as graphic processing units (GPUs), are promising platforms for intrinsic parallel algorithms such as the lattice Boltzmann method (LBM). Although tremendous speedup has been obtained on a single GPU compared with mainstream CPUs, the performance of the LBM for multiple GPUs has not been studied extensively and systematically. In this article, we […]
Feb, 22

GPU-Based Iterative Relative Fuzzy Connectedness Image Segmentation

This paper presents a parallel algorithm for the top of the line among the fuzzy connectedness algorithm family, namely the iterative relative fuzzy connectedness (IRFC) segmentation method. The algorithm of IRFC, realized via image foresting transform (IFT), is implemented by using NVIDIA’s compute unified device architecture (CUDA) platform for segmenting large medical image data sets. […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: