5662

Posts

Sep, 15

A platform-independent tool for modeling parallel programs

Programming languages that can utilize the underlying parallel architecture in shared memory, distributed memory or Graphics Processing Units (GPUs) are used extensively for solving scientific problems. However, from our observation of studying multiple parallel programs from various domains, such programming languages have a substantial amount of sequential code mixed with the parallel code. When rewriting […]
Sep, 15

Ambient occlusion volumes

This paper introduces a new approximation algorithm for the near-field ambient occlusion problem. It combines known pieces in a new way to achieve substantially improved quality over fast methods and substantially improved performance compared to accurate methods. Intuitively, it computes the analog of a shadow volume for ambient light around each polygon, and then applies […]
Sep, 15

Visual simulation of shockwaves

We present an efficient method for visual simulations of shock phenomena in compressible, inviscid fluids. Our algorithm is derived from one class of the finite volume method especially designed for capturing shock propagation, but offers improved efficiency through physically-based simplification and adaptation for graphical rendering. Our technique is capable of handling complex, bidirectional object-shock interactions […]
Sep, 14

Auto-tuning of fast fourier transform on graphics processors

We present an auto-tuning framework for FFTs on graphics processors (GPUs). Due to complex design of the memory and compute subsystems on GPUs, the performance of FFT kernels over the range of possible input parameters can vary widely. We generate several variants for each component of the FFT kernel that, for different cases, are likely […]
Sep, 14

Shadowfax: scaling in heterogeneous cluster systems via GPGPU assemblies

Systems with specialized processors such as those used for accel- erating computations (like NVIDIA’s graphics processors or IBM’s Cell) have proven their utility in terms of higher performance and lower power consumption. They have also been shown to outperform general purpose processors in case of graphics intensive or high performance applications and for enterprise applications […]
Sep, 14

OptiX: a general purpose ray tracing engine

The NVIDIA OptiX ray tracing engine is a programmable system designed for NVIDIA GPUs and other highly parallel architectures. The OptiX engine builds on the key observation that most ray tracing algorithms can be implemented using a small set of programmable operations. Consequently, the core of OptiX is a domain-specific just-in-time compiler that generates custom […]
Sep, 14

Nikola: embedding compiled GPU functions in Haskell

We describe Nikola, a first-order language of array computations embedded in Haskell that compiles to GPUs via CUDA using a new set of type-directed techniques to support re-usable computations. Nikola automatically handles a range of low-level details for Haskell programmers, such as marshaling data to/from the GPU, size inference for buffers, memory management, and automatic […]
Sep, 14

A practical approach of curved ray prestack Kirchhoff Time Migration on GPGPU

We introduced four prototypes of General Purpose GPU solutions by Compute Unified Device Architecture (CUDA) on NVidia GeForce 8800GT and Tesla C870 for a practical Curved Ray Prestack Kirchhoff Time Migration program, which is one of the most widely adopted imaging methods in the seismic data processing industry. We presented how to re-design and re-implement […]
Sep, 14

Anomalous behaviour detection using spatiotemporal oriented energies, subset inclusion histogram comparison and event-driven processing

This paper proposes a novel approach to anomalous behaviour detection in video. The approach is comprised of three key components. First, distributions of spatiotemporal oriented energy are used to model behaviour. This representation can capture a wide range of naturally occurring visual spacetime patterns and has not previously been applied to anomaly detection. Second, a […]
Sep, 14

The case for VOS: the vector operating system

Operating systems research for many-core systems has recently focused its efforts on supporting the scalability of OS-intensive applications running on increasingly parallel hardware. Lost amidst the march towards this parallel future is efficiency: Perfectly parallel software may saturate the parallel capabilities of the host system, but in doing so can waste hardware resources. This paper […]
Sep, 14

Seeing through the fog: an algorithm for fast and accurate touch detection in optical tabletop surfaces

Fast and accurate touch detection is critical to the usability of multi-touch tabletops. In optical tabletops, such as those using the popular FTIR and DI technologies, this requires efficient and effective noise reduction to enhance touches in the camera’s input. Common approaches to noise reduction do not scale to larger tables, leaving designers with a […]
Sep, 14

Subpixel reconstruction antialiasing for deferred shading

Subpixel Reconstruction Antialiasing (SRAA) combines singlepixel (1x) shading with subpixel visibility to create antialiased images without increasing the shading cost. SRAA targets deferred-shading renderers, which cannot use multisample antialiasing. SRAA operates as a post-process on a rendered image with superresolution depth and normal buffers, so it can be incorporated into an existing renderer without modifying […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: