4491

Posts

Jun, 20

Implementing Sparse Matrix-Vector multiplication using CUDA based on a hybrid sparse matrix format

The Sparse Matrix-Vector product (SpMV) is a key operation in engineering and scientific computing. Methods for efficiently implementing it in parallel are critical to the performance of many applications. Modern Graphics Processing Units (GPUs) coupled with the advent of general purpose programming environments like NVIDIA’s CUDA, have gained interest as a viable architecture for data-parallel […]
Jun, 20

Second Order Pre-Integrated Volume Rendering

In the field of Volume Rendering, the pre-integration of arbitrary transfer functions has certainly led to the most significant and convincing results both quality and performance wise, allowing high quality visualization on standard PC consumer graphics. By showing that the ideal scalar signal along the cast rays is better approximated by a succession of polynomial […]
Jun, 20

GPUs for fast triggering and pattern matching at the CERN experiment NA62

In high energy physics experiment the trigger system is crucial to reduce the quantity of data recorded on tape and the acquisition bandwidth requirements. This is particularly true in rare decays experiments. The NA62 experiment aims at measuring the branching ratio of K^+->pi^+nu bar{nu}, predicted in the standard model (SM) at level of ~10^(-10). In […]
Jun, 20

Large-Scale Stereo Display Wall Using Programmable Graphics Hardware

In this paper, we present an large-scale stereo display wall system for tangible telemeeting using programmable graphics hardware. For tangible telemeeting, it is important to provide immersive display with high resolution image to cover up the field of view and provide to the local user the same environment as that of remote site. To achieve […]
Jun, 20

Efficient Surface Reconstruction From Noisy Data Using Regularized Membrane Potentials

A physically motivated method for surface reconstruction is proposed that can recover smooth surfaces from noisy and sparse data sets. No orientation information is required. By a new technique based on regularized-membrane potentials the input sample points are aggregated, leading to improved noise tolerability and outlier removal, without sacrificing much with respect to detail (feature) […]
Jun, 20

Solving 2D Nonlinear Unsteady Convection-Diffusion Equations on Heterogenous Platforms with Multiple GPUs

Solving complex convection-diffusion equations is very important to many practical mathematical and physical problems. After the finite difference discretization, most of the time for equations solution is spent on sparse linear equation solvers. In this paper, our goal is to solve 2D Nonlinear Unsteady Convection-Diffusion Equations by accelerating an iterative algorithm named Jacobi-preconditioned QMRCGSTAB on […]
Jun, 20

Accelerating batched 1D-FFT with a CUDA-capable computer

This work concerns the application of CUDA-based software (Compute Unified Device Architecture), developed by NVIDIA for programmable Graphics Processing units (GPUs). CUDA code is written in ‘C for CUDA’, indicating the standard C programming language with NVIDIA extensions.Our goal was to find out, whether batched (multiple) one-dimensional Fast Fourier Transformation (1DFFT), often encountered in various […]
Jun, 20

Distance field transform with an adaptive iteration method

We propose a novel distance field transform method based on an iterative method adaptively performed on an evolving active band. Our method utilizes a narrow band to store active grid points being computed. Unlike the conventional fast marching method, we do not maintain a priority queue, and instead, perform iterative computing inside the band. This […]
Jun, 20

Image-Based Material Restyling with Fast Non-local Means Filtering

This paper presents a new GPU-based implementation of fast non-local means (NLM) filtering for material restyling. Our fast NLM filtering algorithm is able to achive realtime feedback of interactive image editing. Furthermore a novel material editing method based on our fast NLM filtering is proposed to change the material appearance of image-based objects. Given an […]
Jun, 20

A Parallel Streaming Motion Estimation for Real-Time HD H.264 Encoding on Programmable Processors

Motion estimation is an important computing intensive component in most video compression standards. The high computational costs and heavy memory bandwidth requirements of motion estimation give huge pressure to most existing programmable processors, especially in real-time high definition H.264 video encoding. Emerging stream processing model supported by most programmable processors provide a powerful mechanism to […]
Jun, 20

Volumetric Ambient Occlusion for Real-Time Rendering and Games

This new algorithm, based on GPUs, can compute ambient occlusion to inexpensively approximate global-illumination effects in real-time systems and games. The first step in deriving this algorithm is to examine how ambient occlusion relates to the physically founded rendering equation. The correspondence stems from a fuzzy membership function that defines what constitutes nearby occlusions. The […]
Jun, 19

Auto-tuning Dense Matrix Multiplication for GPGPU with Cache

In this paper we discuss about our experiences in improving the performance of GEMM (both single and double precision) on Fermi architecture using CUDA, and how the new features of Fermi such as cache affect performance. It is found that the addition of cache in GPU on one hand helps the processers take advantage of […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: