Posts
Aug, 11
The DabR – A multitouch system for intuitive 3D scene navigation
Multi-touch capable displays are one of the central emerging technologies in Human Computer Interfaces and many commercial applications like the Apple iPhone or the Microsoft Surface already show the benefit of this interaction technique. But most of the applications are limited to 2D interaction and only little effort has been spent on intuitive 3D interaction […]
Aug, 10
GPU acceleration of matrix-based methods in computational electromagnetics (thesis)
This work considers the acceleration of matrix-based computational electromagnetic (CEM) techniques using graphics processing units (GPUs). These massively parallel processors have gained much support since late 2006, with software tools such as CUDA and OpenCL greatly simplifying the process of harnessing the computational power of these devices. As with any advances in computation, the use […]
Aug, 10
A view-dependent adaptivity metric for real time mesh tessellation
Real-time tessellation methods offer the ability to upsample 3D surface meshes on the fly during rendering. This upsampling relies on 3 major steps. First, it requires a tessellation kernel which can be implemented on GPU or may be already available as a hardware unit. Second, the surface model defines the positions of the newly inserted […]
Aug, 10
THOR: A Transparent Heterogeneous Open Resource framework
Heterogeneous computing which includes mixed architectures with multi-core CPUs as well as hardware accelerators such as GPU hardware, is needed to satisfy future computational needs and energy requirements. Cloud computing currently offers users whose computational needs vary greatly over time, a cost-effect way to gain access to resources. While the current form of cloud-based systems […]
Aug, 10
Pricing the American Option Using Reconfigurable Hardware
We present a novel reconfigurable hardware architecture for accelerating American option pricing using the binomial lattice algorithm. The architecture provides double precision floating point pricing, evaluating up to N = 64,000 time steps in the binomial lattice. Advanced memory management techniques and optimized control logic allow for 4-way parallelism on a single-asset evaluation. These techniques […]
Aug, 10
Multi-layer depth peeling via fragment sort
We present an accelerated depth peeling algorithm for order-independent transparency rendering on graphics hardware. Unlike traditional depth peeling which only peels one layer of transparent pixels per rendering pass, our algorithm peels multiple layers simultaneously per rendering pass. Our acceleration is achieved via our fragment program which sorts and writes multiple fragment colors and depths […]
Aug, 10
A Flexible Multi-Volume Shader Framework for Arbitrarily Intersecting Multi-Resolution Datasets
We present a powerful framework for 3D-texture-based rendering of multiple arbitrarily intersecting volumetric datasets. Each volume is represented by a multi-resolution octree-based structure and we use out-of-core techniques to support extremely large volumes. Users define a set of convex polyhedral volume lenses, which may be associated with one or more volumetric datasets. The volumes or […]
Aug, 10
Real-time continuum grass
Simulating grass field in real-time has many applications, such as in virtual reality and games. Modeling accurate grass-grass, grass-object and grass-wind interactions requires a high computational cost. In this paper, we present a method to simulate grass field in real-time by considering grass field as a two dimensional grid-based continuum and shifting the complex interactions […]
Aug, 10
Performance evaluation and optimization of random memory access on multicores with high productivity
The slow progress in memory access latencies in comparison to CPU speeds has resulted in memory accesses dominating code performance. While architectural enhancements have benefited applications with data locality and sequential access, random memory access still remains a cause for concern. Several benchmarks have been proposed to evaluate the random memory access performance on multicore […]
Aug, 10
A parallel mapping of optical flow to Compute Unified Device Architecture for motion-based image segmentation
A correlation-based optical flow algorithm using compute unified device architecture (CUDA) technology to achieve fast motion-based image segmentation is described. Using CUDA, a 240 processor GPU implementation of an optimized correlation-based optical flow algorithm allows segmentation to be achieved at high frame rates on high-resolution video sequences. Details of the mapping of the optical flow […]
Aug, 10
Approaches for parallelizing reductions on modern GPUs
GPU hardware and software has been evolving rapidly. CUDA versions 1.1 and higher started supporting atomic operations on device memory, and CUDA versions 1.2 and higher started supporting atomic operations on shared memory. This paper focuses on parallelizing applications involving reductions on GPUs. Prior to the availability of support for locking, these applications could only […]
Aug, 9
G-NetMon: A GPU-accelerated Network Performance Monitoring System
At Fermilab, we have prototyped a GPU-accelerated network performance monitoring system, called G-NetMon, to support large-scale scientific collaborations. In this work, we explore new opportunities in network traffic monitoring and analysis with GPUs. Our system exploits the data parallelism that exists within network flow data to provide fast analysis of bulk data movement between Fermilab […]