1391

Posts

Nov, 5

An intelligent semi-automatic application porting system for application accelerators

Work involving the use of application acceleration devices is showing great promise, however, there are still major obstacles preventing their widespread adoption. Currently the process of porting applications to an accelerator requires expertise in both the computer science and application domains, due to the lack of abstraction available. We present our work associated with the […]
Nov, 5

An experimental approach to performance measurement of heterogeneous parallel applications using CUDA

Heterogeneous parallel systems using GPU devices for application acceleration have garnered significant attention in the supercomputing community. However, to realize the full potential of GPU computing, application developers will require tools to measure and analyze accelerator performance with respect to the parallel execution as a whole. A performance measurement technology for the NVIDIA CUDA platform […]
Nov, 5

Streamlining GPU applications on the fly: thread divergence elimination through runtime thread-data remapping

Because of their tremendous computing power and remarkable cost efficiency, GPUs (graphic processing unit) have quickly emerged as a kind of influential platform for high performance computing. However, as GPUs are designed for massive data-parallel computing, their performance is subject to the presence of condition statements in a GPU application. On a conditional branch where […]
Nov, 5

GPU-Accelerated Nearest Neighbor Search for 3D Registration

Nearest Neighbor Search (NNS) is employed by many computer vision algorithms. The computational complexity is large and constitutes a challenge for real-time capability. The basic problem is in rapidly processing a huge amount of data, which is often addressed by means of highly sophisticated search methods and parallelism. We show that NNS based vision algorithms […]
Nov, 5

Debugging GPU stream programs through automatic dataflow recording and visualization

We present a novel framework for debugging GPU stream programs through automatic dataflow recording and visualization. Our debugging system can help programmers locate errors that are common in general purpose stream programs but very difficult to debug with existing tools. A stream program is first compiled into an instrumented program using a compiler. This instrumenting […]
Nov, 5

GPU for Parallel On-Board Hyperspectral Image Processing

Hyperspectral analysis algorithms exhibit inherent parallelism at multiple levels, and map nicely on high performance systems such as massively parallel clusters and networks of computers. Unfortunately, these systems are generally expensive and difficult to adapt to onboard data processing scenarios, in which low-weight and low-power integrated components are desirable to reduce mission pay-load. An exciting […]
Nov, 5

RenderAnts: Interactive REYES Rendering on GPUs

We present RenderAnts, the first system that enables interactive REYES rendering on GPUs. Taking RenderMan scenes and shaders as input, our system first compiles RenderMan shaders to GPU shaders. Then all stages of the basic REYES pipeline, including bounding/splitting, dicing, shading, sampling, compositing and filtering, are executed on GPUs using carefully designed dataparallel algorithms. Advanced […]
Nov, 5

Accelerating MATLAB Image Processing Toolbox functions on GPUs

In this paper, we present our effort in developing an open-source GPU (graphics processing units) code library for the MATLAB Image Processing Toolbox (IPT). We ported a dozen of representative functions from IPT and based on their inherent characteristics, we grouped these functions into four categories: data independent, data sharing, algorithm dependent and data dependent. […]
Nov, 5

Accelerating advanced MRI reconstructions on GPUs

Computational acceleration on graphics processing units (GPUs) can make advanced magnetic resonance imaging (MRI) reconstruction algorithms attractive in clinical settings, thereby improving the quality of MR images across a broad spectrum of applications. This paper describes the acceleration of such an algorithm on NVIDIA
Nov, 5

Efficient computation of sum-products on GPUs through software-managed cache

We present a technique for designing memory-bound algorithms with high data reuse on Graphics Processing Units (GPUs) equipped with close-to-ALU software-managed memory. The approach is based on the efficient use of this memory through the implementation of a software-managed cache. We also present an analytical model for performance analysis of such algorithms. We apply this […]
Nov, 5

Iterative induced dipoles computation for molecular mechanics on GPUs

In this work, we present a first step towards the efficient implementation of polarizable molecular mechanics force fields with GPU acceleration. The computational bottleneck of such applications is found in the treatment of electrostatics, where higher-order multipoles and a self-consistent treatment of polarization effects are needed. We have coded these sections, for the case of […]
Nov, 5

The Scalable Heterogeneous Computing (SHOC) benchmark suite

Scalable heterogeneous computing systems, which are composed of a mix of compute devices, such as commodity multicore processors, graphics processors, reconfigurable processors, and others, are gaining attention as one approach to continuing performance improvement while managing the new challenge of energy efficiency. As these systems become more common, it is important to be able to […]
Page 904 of 926« First...102030...902903904905906...910920...Last »

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: