6156

Posts

Oct, 27

Techniques to maximize memory bandwidth on the Rigel compute accelerator

The Rigel compute accelerator has been developed to explore alternative architectures for massively parallel processor chips. Currently GPUs that use wide SIMD are the primary implementations in this space. Many applications targeted to this space are performance limited by the memory all, so comparing the memory system performance of Rigel and GPUs is desirable. Memory […]
Oct, 27

Efficient Simulation of Ocean and Land Scenes Based on Digital Earth

Efficient and realistic simulation of ocean and land scenes is one of the hotspot and difficult problems of computer graphic. Most simulation of the recent ocean and land scenes is based on plane and is in a limited region. They didn’t consider the factors of earth curvature, nor the edge between ocean and land, can’t […]
Oct, 27

Sample distribution shadow maps

This paper introduces Sample Distribution Shadow Maps (SDSMs), a new algorithm for hard and soft-edged shadows that greatly reduces undersampling, oversampling, and geometric aliasing errors compared to other shadow map techniques. SDSMs fall into the space between scene-dependent, variable-performance shadow algorithms and scene-independent, fixed-performance shadow algorithms. They provide a fully automated solution to shadow map […]
Oct, 27

Self-calibration of geometric and radiometric parameters for cone-beam computed tomography

Thanks to the advances in parallel processing hardware, iterative algorithms for cone beam reconstruction are now available with computation times acceptable for clinical use. At the same time they are able to accomodate more accurately the physical effects underlying the X-Ray imaging process. Many parameters are involved, which need to be precisely calibrated in order […]
Oct, 27

Development of a volume rendering system using 3D texture compression techniques on general-purpose personal computers

In this paper, we present the development of a highspeed volume rendering system that combines 3D texture compression and parallel programming techniques for rendering multiple high-resolution 3D images obtained with medical or industrial CT. The 3D texture compression algorithm (DXT5) provides extremely high efficiency since it reduces the memory consumption to 1/4 of the original […]
Oct, 27

The CUDA implementation of the method of lines for the curvature dependent flows

We study the use of a GPU for the numerical approximation of the curvature dependent flows of graphs – the mean-curvature flow and the Willmore flow. Both problems are often applied in image processing where fast solvers are required. We approximate these problems using the complementary finite volume method combined with the method of lines. […]
Oct, 27

Off-axis quantitative phase imaging processing using CUDA: toward real-time applications

We demonstrate real time off-axis Quantitative Phase Imaging (QPI) using a phase reconstruction algorithm based on NVIDIA’s CUDA programming model. The phase unwrapping component is based on Goldstein’s algorithm. By mapping the process of extracting phase information and unwrapping to GPU, we are able to speed up the whole procedure by more than 18.8x with […]
Oct, 27

Parallelization of Single Threaded Applications using OpenMP and CUDA/C

Extracting performance improvements from modest and cost-effective computing resources is one of the key challenges in the IT sector. CPU clock speeds have reached a plateau in recent years, with no significant clock speed improvements forthcoming. However, we see an increasing number of computational cores available on the desktop, via the CPU and, more recently, […]
Oct, 27

Efficient Implementation and Evaluation of Methods for the Estimation of Motion in Image Sequences

Optical flow estimation (the estimation of the apparent motion of objects in an image sequence) is used in many applications like video compression, object detection and tracking, robot navigation, and so on. This project was focussed on one specific optical flow estimation algorithm, which uses directional filters and an AM-FM demodulation algorithm for the estimation […]
Oct, 27

Efficient Implementation of Optical Flow Algorithm Based on Directional Filters on a GPU Using CUDA

This paper describes an optical flow estimation algorithm using directional filters and an AM-FM demodulation algorithm, and its efficient implementation on a NVIDIA GPU using CUDA. The resulting implementation is several thousand times faster than the corresponding MATLAB code, which makes the described scheme suitable for real-time applications. This paper also describes a new multiresolution […]
Oct, 26

Dense Dynamic Programming on Multi GPU

The implementation via CUDA of a hybrid dense dynamic programming method for knapsack problems on amulti-GPU architecture is considered. Tests are carried out on a Bull cluster with Tesla S1070 computing systems. A first series of computational results shows substantial speedup. The speedup factor is close to 28 with two GPUs.
Oct, 26

Precision and Performance: Floating Point and IEEE 754 Compliance for NVIDIA GPUs

A number of issues related to floating point accuracy and compliance are a frequent source of confusion on both CPUs and GPUs. The purpose of this white paper is to discuss the most common issues related to NVIDIA GPUs and to supplement the documentation in the CUDA C Programming Guide.

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: