4531

Posts

Jun, 24

Accelerating Smith-Waterman on Heterogeneous CPU-GPU Systems

This paper describes the approach and the speedup obtained in performing Smith-Waterman database searches on heterogeneous platforms comprising of multi core CPU and multi GPU systems. Most of the advanced and optimized Smith-Waterman algorithm versions have demonstrated remarkable speedup over NCBI BLAST versions, viz., SWPS3 based on x86 SSE2 instructions and CUDASW++ v2.0 CUDA implementation […]
Jun, 24

Optimization of a GPU Implementation of Multi-Dimensional RF Pulse Design Algorithm

Modern graphics processing units work efficiently in the RF Pulse Design for MR image reconstruction, which could make the reconstruction of images quicker with higher signal to noise ratio (SNR) and higher resolution. This paper introduces using GPU techniques, such as multithreads, sharedMem, coalesce, and constMem to optimize the performance of the conjugate gradient least […]
Jun, 24

Implementation of a Soft Morphological Filter Based on GPU Framework

An approach to execute an improved soft morphological filter (ISMF) on Graphic Processing Unit (GPU) is present in this paper. ISMF regarded as the extension of the standard morphological operators performs well in removal of Salt-and-pepper noise and Gaussian noise simultaneously. Since a great number of data comparison and transmission, it will take a long […]
Jun, 24

An algorithm for efficient computation of spatial impulse response on the GPU with application in ultrasound simulation

Computation of the spatial impulse response (SIR) is a time-consuming but fundamental step in the computation of the linear ultrasonic fields in homogeneous media and the scattering fields in the presence of non-homogeneity. In this paper, we present a new algorithm for the computation of the SIR which is suitable for parallelization on massively multiprocessing […]
Jun, 24

Design and implementation of a time-division multiplexing scan architecture using serializer and deserializer in GPU chips

We present the design and implementation details of a time-division demultiplexing/multiplexing based scan architecture using serializer/deserializer. This is one of the key DFT features implemented on NVIDIA’s Fermi family GPU (Graphic Processing Unit) chips. We provide a comprehensive description on the architecture and specifications. We also depict a compact serializer/deserializer module design, test timing consideration, […]
Jun, 24

GPU-Based Parallel Signature Scanning and Hash Generation

Today, nearly every user of electronic devices is affected by threats. Computer viruses infect harmless programs and change the function of that program. One means against these threats is a virus scanner, searching for signatures of known viruses within code and/or data. In this work, we present a novel approach to on-line virus scanning and […]
Jun, 23

Efficiently GPU-accelerating long kernel convolutions in 3-D DIRECT TOF PET reconstruction via a kernel decomposition scheme

The DIRECT approach for 3-D Time-of-Flight (TOF) PET reconstruction performs all iterative predictor-corrector operations directly in image space. A computational bottleneck here is the convolution with the long TOF (resolution) kernels. Accelerating this convolution operation using GPUs is very important especially for spatially variant resolution kernels, which cannot be efficiently implemented in the Fourier domain. […]
Jun, 23

Real-time arbitrary view rendering on GPU from stereo video and time-of-flight camera

Generating in-between images from multiple views of a scene is a crucial task for both computer vision and computer graphics fields. Photorealistic rendering, 3DTV and robot navigation are some of many applications which benefit from arbitrary view synthesis, if it is achieved in real-time. GPUs excel in achieving high computation power by processing arrays of […]
Jun, 23

Rapid star map simulation based on GPU

Proposed and implemented a rapid star map simulation method based on GPU. The method used the vertex shaders of OpenGL shading language to perform real-time calculation of star positions. Combined with the display list technology, the results of calculation were used to perform the simulation of the initial star maps, then the initial star maps […]
Jun, 23

Fast implementation of fully iterative scatter corrected OSEM for HRRT using GPU

Accurate scatter correction is especially important for high-resolution 3D PETs due to the lack of inter-slice septa. To address this problem, a fully 3D iterative scatter-corrected OSEM in which a 3D single scatter simulation (SSS) is alternatively performed with a 3D OSEM reconstruction until convergence was recently proposed. However, due to the computational complexity of […]
Jun, 23

LOD Terrain Rendering by Local Parallel Processing on GPU

In this paper, we present a new technique for highly efficient terrain rendering using continuous view-dependent Level-of-Detail based on hardware tessellation unit found in modern GPUs. Our technique is based on parallel local processing, in the sense that the results at each terrain patch do not depend on results already obtained at other patches. This […]
Jun, 23

Case study: Runtime reduction of a buffer insertion algorithm using GPU parallel programming

In this paper, we present a case study on runtime reduction of VLSI CAD programs using parallel computing. Specifically, we parallelize a buffer insertion algorithm that minimizes power dissipation. We choose Graphic Processing Units (GPUs) as the low-cost hardware that supports parallel computing. We redesign the algorithm data structure to accommodate GPU based computing. As […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: