Posts
Nov, 30
Fractal Based Method on Hardware Acceleration for Natural Environments
Natural scenes from the real world are highly complex, such that the modeling and rendering of natural shapes, like mountains, trees and clouds, are very difficult and time consuming and require a huge amount of memory. Intuitively, the critical characteristics of natural scenes are their self- similarity properties. Motivated by the self-similarity feature of the […]
Nov, 30
Digitize Your Body and Action in 3-D at Over 10 FPS: Real Time Dense Voxel Reconstruction and Marker-less Motion Tracking via GPU Acceleration
In this paper, we present an approach to reconstruct 3-D human motion from multi-cameras and track human skeleton using the reconstructed human 3-D point (voxel) cloud. We use an improved and more robust algorithm, probabilistic shape from silhouette to reconstruct human voxel. In addition, the annealed particle filter is applied for tracking, where the measurement […]
Nov, 30
Design and Storage Optimization of GPU-based Parallel Program of Image Registration for Remote Sensing
Image registration is a crucial step of many remote sensing related applications. As the scale of data and complexity of algorithm keep growing, image registration faces great challenges of its processing speed. In recent years, the computing capacity of GPU improves greatly. Taking the benefits of using GPU to solve general propose problem, we research […]
Nov, 30
GPU Accelerated Parallel Occupancy Voxel Based ICP for Position Tracking
Tracking the position of a robot in an unknown environment is an important problem in robotics. Iterative closest point algorithms using range data are commonly used for position tracking, but can be computationally intensive. We describe a highly parallel occupancy grid iterative closest point position tracking algorithm designed for use on a GPU, that uses […]
Nov, 29
Highly Optimized Full GPU-Acceleration of Non-hydrostatic Weather Model SCALE-LES
SCALE-LES is a non-hydrostatic weather model developed at RIKEN, Japan. It is intended to be a global high- resolution model that would be scaled to exascale systems. This paper introduces the full GPU acceleration of all SCALE-LES modules. Moreover, the paper demonstrates the strategies to handle the unique challenges of accelerating SCALE-LES using GPU. The […]
Nov, 29
Benchmarking Parallel Performance on Many-Core Processors
With the emergence of many-core processor architectures onto the HPC scene, concerns arise regarding the performance and productivity of numerous existing parallel-programming tools, models, and languages. As these devices begin augmenting conventional distributed cluster systems in an evolving age of heterogeneous supercomputing, proper evaluation and profiling of many-core processors must occur in order to understand […]
Nov, 28
The Future of Accelerator Programming: Abstraction, Performance or Can We Have Both?
Recently, parallel programming has become necessary in order to obtain performance gains, primarily due to power limitations. However parallel architectures differ substantially from each other, often require specialized knowledge, and typically necessitate reimplementation and fine tuning of application code. These slow tasks frequently result in situations where most of the time is spent reimplementing old […]
Nov, 28
The Use of Automated Search in Deriving Software Testing Strategies
Testing a software artefact using every one of its possible inputs would normally cost too much, and take too long, compared to the benefits of detecting faults in the software. Instead, a testing strategy is used to select a small subset of the inputs with which to test the software. The criterion used to select […]
Nov, 28
American Basket Option Pricing on a multi GPU Cluster
This article presents a multi GPU adaptation of a specific Monte Carlo and classification based method for pricing American basket options, due to Picazo [1]. The first part relates how to combine fine and coarse grained parallelization to price American basket options. In order to benefit from different GPU devices, a dynamic strategy of kernel […]
Nov, 28
Hybrid Programming using OpenSHMEM and OpenACC
With high performance systems exploiting multicore and accelerator-based architectures on a distributed shared memory system, heterogenous hybrid programming models are the natural choice to exploit all the hardware made available on these systems. Previous efforts looking into hybrid models have primarily focused on using OpenMP directives (for shared memory programming) with MPI (for inter-node programming […]
Nov, 27
Accelerated Primality Testing Using GPUs
This aim of this project was to port the FFT routines of LLRP to CUDA, which was done successfully. This success is quantified as the FFT portions of the program executing in a much shorter time than the FFTW transforms. The project shows that GPUs are certainly viable for use in numerical codes such as […]
Nov, 27
Autotuning of Pattern Runtimes for Accelerated Parallel Systems
Parallel architectures with node-level accelerators promise significant performance improvements over conventional homogeneous systems. To cope with the increased complexity of programming such systems various pattern-based programming libraries have become available. In this paper we present our work on providing autotuning capabilities for two runtime libraries that provide parallel programming patterns on state-of-the-art heterogeneous hardware. We […]