1864

Posts

Nov, 28

On the efficiency of iterative ordered subset reconstruction algorithms for acceleration on GPUs

Expectation Maximization (EM) and the Simultaneous Iterative Reconstruction Technique (SIRT) are two iterative computed tomography reconstruction algorithms often used when the data contain a high amount of statistical noise, have been acquired from a limited angular range, or have a limited number of views. A popular mechanism to increase the rate of convergence of these […]
Nov, 28

Parallel LDPC Decoding on GPUs Using a Stream-Based Computing Approach

Abstract Low-Density Parity-Check (LDPC) codes are powerful error correcting codes adopted by recent communication standards. LDPC decoders are based on belief propagation algorithms, which make use of a Tanner graph and very intensive message-passing computation, and usually require hardware-based dedicated solutions. With the exponential increase of the computational power of commodity graphics processing units (GPUs), […]
Nov, 28

Time-varying clustering for local lighting and material design

Abstract This paper presents an interactive graphics processing unit (GPU)-based relighting system in which local lighting condition, surface materials and viewing direction can all be changed on the fly. To support these changes, we simulate the lighting transportation process at run time, which is normally impractical for interactive use due to its huge computational burden. […]
Nov, 28

Shader-based tessellation to save memory bandwidth in a mobile multimedia processor

In this paper, we propose an architecture of tessellation hardware to save memory bandwidth in a mobile multimedia processor. To reduce the implementation overhead, floating-point computations of tessellation are accelerated by the conventional GPU pipeline, and only tessellation-specific control logic is handled by an additional hardware unit. Tightly coupled with a vertex shader, the additional […]
Nov, 28

Complexity effective memory access scheduling for many-core accelerator architectures

Modern DRAM systems rely on memory controllers that employ out-of-order scheduling to maximize row access locality and bank-level parallelism, which in turn maximizes DRAM bandwidth. This is especially important in graphics processing unit (GPU) architectures, where the large quantity of parallelism places a heavy demand on the memory system. The logic needed for out-of-order scheduling […]
Nov, 27

2011 Symposium on Application Accelerators in High Performance Computing (SAAHPC’11)

What do GPUs, FPGAs, vector processors and other exotic special-purpose chips have in common? They are advanced processor architectures that the scientific community is using to accelerate computationally demanding applications. While high-performance computing systems that use application accelerators are still rare, they will be the norm rather than the exception in the near future. The […]
Nov, 27

Interactive Pixel-Accurate Free Viewpoint Rendering from Images with Silhouette Aware Sampling

We present an integrated, fully GPU-based processing pipeline to interactively render new views of arbitrary scenes from calibrated but otherwise unstructured input views. In a two-step procedure, our method first generates for each input view a dense proxy of the scene using a new multi-view stereo formulation. Each scene proxy consists of a structured cloud […]
Nov, 27

Aurally and visually enhanced audio search with soundtorch

Finding a specific or an artistically appropriate sound in a vast collection comprising thousands of audio files containing recordings of, say, footsteps, gunshots, and thunderclaps easily becomes a chore. To improve on this, we have developed an enhanced auditory and graphical zoomable user interface that leverages the human brain’s capability to single out sounds from […]
Nov, 27

A Massively Parallel Architecture for Bioinformatics

Today’s general purpose computers lack in meeting the requirements on computing performance for standard applications in bioinformatics like DNA sequence alignment, error correction for assembly, or TFBS finding. The size of DNA sequence databases doubles twice a year. On the other hand the advance in computing performance per unit cost only doubles every 2 years. […]
Nov, 27

Evenly Spaced Streamlines for Surfaces: An Image-Based Approach

Abstract We introduce a novel, automatic streamline seeding algorithm for vector fields defined on surfaces in 3D space. The algorithm generates evenly spaced streamlines fast, simply and efficiently for any general surface-based vector field. It is general because it handles large, complex, unstructured, adaptive resolution grids with holes and discontinuities, does not require a parametrization, […]
Nov, 27

Accelerating error correction in high-throughput short-read DNA sequencing data with CUDA

Emerging DNA sequencing technologies open up exciting new opportunities for genome sequencing by generating read data with a massive throughput. However, produced reads are significantly shorter and more error-prone compared to the traditional Sanger shotgun sequencing method. This poses challenges for de-novo DNA fragment assembly algorithms in terms of both accuracy (to deal with short, […]
Nov, 27

Parallel reconstruction of neighbor-joining trees for large multiple sequence alignments using CUDA

Computing large multiple protein sequence alignments using progressive alignment tools such as ClustalW requires several hours on state-of-the-art workstations. ClustalW uses a three-stage processing pipeline: (i) pairwise distance computation; (ii) phylogenetic tree reconstruction; and (iii) progressive multiple alignment computation. Previous work on accelerating ClustalW was mainly focused on parallelizing the first stage and achieved good […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: