Posts
Jul, 17
Voxels on fire
We introduce a method for the animation of fire propagation and the burning consumption of objects represented as volumetric data sets. Our method uses a volumetric fire propagation model based on an enhanced distance field. It can simulate the spreading of multiple fire fronts over a specified isosurface without actually having to create that isosurface. […]
Jul, 17
Arbitrary dimension Reed-Solomon coding and decoding for extended RAID on GPUs
Reed-Solomon coding is a method of generating arbitrary amounts of checksum information from original data via matrix-vector multiplication in finite fields. Previous work has shown that CPUs are not well-matched to this type of computation, but recent graphical processing units (GPUs) have been shown through a case study to perform this encoding quickly for the […]
Jul, 17
Linpack evaluation on a supercomputer with heterogeneous accelerators
We report Linpack benchmark results on the TSUBAME supercomputer, a large scale heterogeneous system equipped with NVIDIA Tesla GPUs and ClearSpeed SIMD accelerators. With all of 10,480 Opteron cores, 640 Xeon cores, 648 ClearSpeed accelerators and 624 NVIDIA Tesla GPUs, we have achieved 87.01TFlops, which is the third record as a heterogeneous system in the […]
Jul, 17
Color Seamlessness in Multi-Projector Displays Using Constrained Gamut Morphing
Multi-projector displays show significant spatial variation in 3D color gamut due to variation in the chromaticity gamuts across the projectors, vignetting effect of each projector and also overlap across adjacent projectors. In this paper we present a new constrained gamut morphing algorithm that removes all these variations and results in true color seamlessness across tiled […]
Jul, 17
Workload Characterization of 3D Games
The rapid pace of change in 3D game technology makes workload characterization necessary for every game generation. Comparing to CPU characterization, far less quantitative information about games is available. This paper focuses on analyzing a set of modern 3D games at the API call level and at the micro architectural level using the Attila simulator. […]
Jul, 17
Performance improvements of real-time crowd simulations
The current challenge for crowd simulations is the design and development of a scalable system that is capable of simulating the individual behavior of millions of complex agents populating large scale virtual worlds with a good frame rate. In order to overcome this challenge, this thesis proposes different improvements for crowd simulations. Concretely, we propose […]
Jul, 17
Implementation of random linear network coding on OpenGL-enabled graphics cards
This paper describes the implementation of network coding on OpenGL-enabled graphics cards. Network coding is an interesting approach to increase the capacity and robustness in multi-hop networks. The current problem is to implement random linear network coding on mobile devices which are limited in computational power, energy, and memory. Some mobile devices are equipped with […]
Jul, 17
Realtime background subtraction from dynamic scenes
This paper examines the problem of moving object detection. More precisely, it addresses the difficult scenarios where background scene textures in the video might change over time. In this paper, we formulate the problem mathematically as minimizing a constrained risk functional motivated from the large margin principle. It is a generalization of the one class […]
Jul, 17
Using Graphics Processor Units (GPUs) for Automatic Video Structuring
The rapid pace of development of graphic processor units (GPUs) in recent years in terms of performance and programmability has attracted the attention of those seeking to leverage alternative architectures for better performance than that which commodity CPUs can provide. In this paper, the potential of the GPU in automatically structuring video is examined, specifically […]
Jul, 17
hiCUDA: High-Level GPGPU Programming
Graphics Processing Units (GPUs) have become a competitive accelerator for applications outside the graphics domain, mainly driven by the improvements in GPU programmability. Although the Compute Unified Device Architecture (CUDA) is a simple C-like interface for programming NVIDIA GPUs, porting applications to CUDA remains a challenge to average programmers. In particular, CUDA places on the […]
Jul, 17
Highly parallel decoding of space-time codes on graphics processing units
Graphics processing units (GPUs) with a few hundred extremely simple processors represent a paradigm shift for highly parallel computations. We use this emergent GPU architecture to provide a first demonstration of the feasibility of real time ML decoding (in software) of a high rate space-time block code that is representative of codes incorporated in 4th […]
Jul, 17
H- and C-level WFST-based large vocabulary continuous speech recognition on Graphics Processing Units
We have implemented 20,000-word large vocabulary continuous speech recognition (LVCSR) systems employing H- and C-level weighted finite state transducer (WFST) based networks on Graphics Processing Units (GPUs). Both the emission probability computation and the Viterbi beam search are implemented on the GPU in a data-parallel manner to minimize the extra data transfer time between the […]