5226

Posts

Aug, 10

THOR: A Transparent Heterogeneous Open Resource framework

Heterogeneous computing which includes mixed architectures with multi-core CPUs as well as hardware accelerators such as GPU hardware, is needed to satisfy future computational needs and energy requirements. Cloud computing currently offers users whose computational needs vary greatly over time, a cost-effect way to gain access to resources. While the current form of cloud-based systems […]
Aug, 10

Pricing the American Option Using Reconfigurable Hardware

We present a novel reconfigurable hardware architecture for accelerating American option pricing using the binomial lattice algorithm. The architecture provides double precision floating point pricing, evaluating up to N = 64,000 time steps in the binomial lattice. Advanced memory management techniques and optimized control logic allow for 4-way parallelism on a single-asset evaluation. These techniques […]
Aug, 10

Multi-layer depth peeling via fragment sort

We present an accelerated depth peeling algorithm for order-independent transparency rendering on graphics hardware. Unlike traditional depth peeling which only peels one layer of transparent pixels per rendering pass, our algorithm peels multiple layers simultaneously per rendering pass. Our acceleration is achieved via our fragment program which sorts and writes multiple fragment colors and depths […]
Aug, 10

A Flexible Multi-Volume Shader Framework for Arbitrarily Intersecting Multi-Resolution Datasets

We present a powerful framework for 3D-texture-based rendering of multiple arbitrarily intersecting volumetric datasets. Each volume is represented by a multi-resolution octree-based structure and we use out-of-core techniques to support extremely large volumes. Users define a set of convex polyhedral volume lenses, which may be associated with one or more volumetric datasets. The volumes or […]
Aug, 10

Real-time continuum grass

Simulating grass field in real-time has many applications, such as in virtual reality and games. Modeling accurate grass-grass, grass-object and grass-wind interactions requires a high computational cost. In this paper, we present a method to simulate grass field in real-time by considering grass field as a two dimensional grid-based continuum and shifting the complex interactions […]
Aug, 10

Performance evaluation and optimization of random memory access on multicores with high productivity

The slow progress in memory access latencies in comparison to CPU speeds has resulted in memory accesses dominating code performance. While architectural enhancements have benefited applications with data locality and sequential access, random memory access still remains a cause for concern. Several benchmarks have been proposed to evaluate the random memory access performance on multicore […]
Aug, 10

A parallel mapping of optical flow to Compute Unified Device Architecture for motion-based image segmentation

A correlation-based optical flow algorithm using compute unified device architecture (CUDA) technology to achieve fast motion-based image segmentation is described. Using CUDA, a 240 processor GPU implementation of an optimized correlation-based optical flow algorithm allows segmentation to be achieved at high frame rates on high-resolution video sequences. Details of the mapping of the optical flow […]
Aug, 10

Approaches for parallelizing reductions on modern GPUs

GPU hardware and software has been evolving rapidly. CUDA versions 1.1 and higher started supporting atomic operations on device memory, and CUDA versions 1.2 and higher started supporting atomic operations on shared memory. This paper focuses on parallelizing applications involving reductions on GPUs. Prior to the availability of support for locking, these applications could only […]
Aug, 9

G-NetMon: A GPU-accelerated Network Performance Monitoring System

At Fermilab, we have prototyped a GPU-accelerated network performance monitoring system, called G-NetMon, to support large-scale scientific collaborations. In this work, we explore new opportunities in network traffic monitoring and analysis with GPUs. Our system exploits the data parallelism that exists within network flow data to provide fast analysis of bulk data movement between Fermilab […]
Aug, 9

G-NetMon: A GPU-accelerated Network Performance Monitoring System for Large Scale Scientific Collaborations

Network traffic is difficult to monitor and analyze, especially in high-bandwidth networks. Performance analysis, in particular, presents extreme complexity and scalability challenges. GPU (Graphics Processing Unit) technology has been utilized recently to accelerate general purpose scientific and engineering computing. GPUs offer extreme thread-level parallelism with hundreds of simple cores. Their data-parallel execution model can rapidly […]
Aug, 9

Real-Time All-in-Focus Video-Based Rendering Using A Network Camera Array

We present a real-time video-based rendering system using a network camera array. Our system consists of 64 commodity network cameras that are connected to a single PC through a Gigabit Ethernet. To render a high-quality novel view, we estimate a view-dependent per-pixel depth map in real-time by using a layered representation. The rendering algorithm is […]
Aug, 9

Graphics Processing Units for Handhelds

During the past few years, mobile phones and other handheld devices have gone from only handling dull text-based menu systems to, on an increasing number of models, being able to render high-quality three-dimensional graphics at high frame rates. This paper is a survey of the special considerations that must be taken when designing graphics processing […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: