1869

Posts

Nov, 28

Depth map enhanced macroblock partitioning for H.264 video coding of computer graphics content

In this paper, we present a method to speed up video encoding of GPU rendered scenes. Modern video codecs, like H.264/AVC, are based on motion compensation and support partitioning of macroblocks, e.g. 16×16, 16×8, 8×8, 8×4 etc. In general, encoders use expensive search methods to determine suitable motion vectors and compare the rate-distortion score for […]
Nov, 28

Exploring Reconfigurable Architectures for Tree-Based Option Pricing Models

This article explores the application of reconfigurable hardware to the acceleration of financial computation using tree-based pricing models. Two parallel pipelined architectures have been developed for option valuation using binomial trees and trinomial trees, with support for concurrent evaluation of independent options to achieve high pricing throughput. Our results show that the tree-based models executing […]
Nov, 28

42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence

As an entry for the 2009 Gordon Bell price/performance prize, we present the results of two different hierarchical N-body simulations on a cluster of 256 graphics processing units (GPUs). Unlike many previous N-body simulations on GPUs that scale as O(N^2), the present method calculates the O(N log N) treecode and O(N) fast multipole method (FMM) […]
Nov, 28

Real-time restoration algorithm based on one-dimensional Wiener filters for different rates of image motion blur

To eliminate side-oblique image motion, a fast image algorithm is proposed for implementation on aerial camera systems. When an aerial camera works at a side-oblique angle, much parallel image motion with different rates will occur on the focal plane array simultaneously. Through analysis of how different rates of parallel image motion blur are generated and […]
Nov, 28

A shared-scene-graph image-warping architecture for VR: Low latency versus image quality

Designing low end-to-end latency system architectures for virtual reality is still an open and challenging problem. We describe the design, implementation and evaluation of a client-server depth-image warping architecture that updates and displays the scene graph at the refresh rate of the display. Our approach works for scenes consisting of dynamic and interactive objects. The […]
Nov, 28

On the efficiency of iterative ordered subset reconstruction algorithms for acceleration on GPUs

Expectation Maximization (EM) and the Simultaneous Iterative Reconstruction Technique (SIRT) are two iterative computed tomography reconstruction algorithms often used when the data contain a high amount of statistical noise, have been acquired from a limited angular range, or have a limited number of views. A popular mechanism to increase the rate of convergence of these […]
Nov, 28

Parallel LDPC Decoding on GPUs Using a Stream-Based Computing Approach

Abstract Low-Density Parity-Check (LDPC) codes are powerful error correcting codes adopted by recent communication standards. LDPC decoders are based on belief propagation algorithms, which make use of a Tanner graph and very intensive message-passing computation, and usually require hardware-based dedicated solutions. With the exponential increase of the computational power of commodity graphics processing units (GPUs), […]
Nov, 28

Time-varying clustering for local lighting and material design

Abstract This paper presents an interactive graphics processing unit (GPU)-based relighting system in which local lighting condition, surface materials and viewing direction can all be changed on the fly. To support these changes, we simulate the lighting transportation process at run time, which is normally impractical for interactive use due to its huge computational burden. […]
Nov, 28

Shader-based tessellation to save memory bandwidth in a mobile multimedia processor

In this paper, we propose an architecture of tessellation hardware to save memory bandwidth in a mobile multimedia processor. To reduce the implementation overhead, floating-point computations of tessellation are accelerated by the conventional GPU pipeline, and only tessellation-specific control logic is handled by an additional hardware unit. Tightly coupled with a vertex shader, the additional […]
Nov, 28

Complexity effective memory access scheduling for many-core accelerator architectures

Modern DRAM systems rely on memory controllers that employ out-of-order scheduling to maximize row access locality and bank-level parallelism, which in turn maximizes DRAM bandwidth. This is especially important in graphics processing unit (GPU) architectures, where the large quantity of parallelism places a heavy demand on the memory system. The logic needed for out-of-order scheduling […]
Nov, 27

2011 Symposium on Application Accelerators in High Performance Computing (SAAHPC’11)

What do GPUs, FPGAs, vector processors and other exotic special-purpose chips have in common? They are advanced processor architectures that the scientific community is using to accelerate computationally demanding applications. While high-performance computing systems that use application accelerators are still rare, they will be the norm rather than the exception in the near future. The […]
Nov, 27

Aurally and visually enhanced audio search with soundtorch

Finding a specific or an artistically appropriate sound in a vast collection comprising thousands of audio files containing recordings of, say, footsteps, gunshots, and thunderclaps easily becomes a chore. To improve on this, we have developed an enhanced auditory and graphical zoomable user interface that leverages the human brain’s capability to single out sounds from […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org