Posts
Jul, 17
Linpack evaluation on a supercomputer with heterogeneous accelerators
We report Linpack benchmark results on the TSUBAME supercomputer, a large scale heterogeneous system equipped with NVIDIA Tesla GPUs and ClearSpeed SIMD accelerators. With all of 10,480 Opteron cores, 640 Xeon cores, 648 ClearSpeed accelerators and 624 NVIDIA Tesla GPUs, we have achieved 87.01TFlops, which is the third record as a heterogeneous system in the […]
Jul, 17
Color Seamlessness in Multi-Projector Displays Using Constrained Gamut Morphing
Multi-projector displays show significant spatial variation in 3D color gamut due to variation in the chromaticity gamuts across the projectors, vignetting effect of each projector and also overlap across adjacent projectors. In this paper we present a new constrained gamut morphing algorithm that removes all these variations and results in true color seamlessness across tiled […]
Jul, 17
Workload Characterization of 3D Games
The rapid pace of change in 3D game technology makes workload characterization necessary for every game generation. Comparing to CPU characterization, far less quantitative information about games is available. This paper focuses on analyzing a set of modern 3D games at the API call level and at the micro architectural level using the Attila simulator. […]
Jul, 17
Performance improvements of real-time crowd simulations
The current challenge for crowd simulations is the design and development of a scalable system that is capable of simulating the individual behavior of millions of complex agents populating large scale virtual worlds with a good frame rate. In order to overcome this challenge, this thesis proposes different improvements for crowd simulations. Concretely, we propose […]
Jul, 17
Implementation of random linear network coding on OpenGL-enabled graphics cards
This paper describes the implementation of network coding on OpenGL-enabled graphics cards. Network coding is an interesting approach to increase the capacity and robustness in multi-hop networks. The current problem is to implement random linear network coding on mobile devices which are limited in computational power, energy, and memory. Some mobile devices are equipped with […]
Jul, 17
Realtime background subtraction from dynamic scenes
This paper examines the problem of moving object detection. More precisely, it addresses the difficult scenarios where background scene textures in the video might change over time. In this paper, we formulate the problem mathematically as minimizing a constrained risk functional motivated from the large margin principle. It is a generalization of the one class […]
Jul, 17
Using Graphics Processor Units (GPUs) for Automatic Video Structuring
The rapid pace of development of graphic processor units (GPUs) in recent years in terms of performance and programmability has attracted the attention of those seeking to leverage alternative architectures for better performance than that which commodity CPUs can provide. In this paper, the potential of the GPU in automatically structuring video is examined, specifically […]
Jul, 17
hiCUDA: High-Level GPGPU Programming
Graphics Processing Units (GPUs) have become a competitive accelerator for applications outside the graphics domain, mainly driven by the improvements in GPU programmability. Although the Compute Unified Device Architecture (CUDA) is a simple C-like interface for programming NVIDIA GPUs, porting applications to CUDA remains a challenge to average programmers. In particular, CUDA places on the […]
Jul, 17
Highly parallel decoding of space-time codes on graphics processing units
Graphics processing units (GPUs) with a few hundred extremely simple processors represent a paradigm shift for highly parallel computations. We use this emergent GPU architecture to provide a first demonstration of the feasibility of real time ML decoding (in software) of a high rate space-time block code that is representative of codes incorporated in 4th […]
Jul, 17
H- and C-level WFST-based large vocabulary continuous speech recognition on Graphics Processing Units
We have implemented 20,000-word large vocabulary continuous speech recognition (LVCSR) systems employing H- and C-level weighted finite state transducer (WFST) based networks on Graphics Processing Units (GPUs). Both the emission probability computation and the Viterbi beam search are implemented on the GPU in a data-parallel manner to minimize the extra data transfer time between the […]
Jul, 17
Focus measurement on programmable graphics hardware for all in-focus rendering from light fields
This paper deals with a method for interactive rendering of photorealistic images, which is a fundamental technology in the field of virtual reality. Since the latest graphics processing units (GPUs) are programmable, they are expected to be useful for various applications including numerical computation and image processing. This paper proposes a method for focus measurement […]
Jul, 17
Speedup of Fuzzy Clustering Through Stream Processing on Graphics Processing Units
As the number of data points, feature dimensionality, and number of centers for clustering algorithms increase, computational tractability becomes a problem. The fuzzy c-means has a large degree of inherent algorithmic parallelism that modern CPU architectures do not exploit. Many pattern recognition algorithms can be sped up on a graphics processing unit (GPU) as long […]