Posts
Nov, 19
Towards Efficient GPU Sharing on Multicore Processors
Scalable systems employing a mix of GPUs with CPUs are becoming increasingly prevalent in high-performance computing (HPC). The presence of such accelerators introduces significant challenges and complexities to both language developers and end users. This paper provides a close study of efficient coordination mechanisms to handle parallel requests from multiple hosts of control to a […]
Nov, 19
ShoveRand: a model-driven framework to easily generate random numbers on GP-GPU
Stochastic simulations are often sensitive to the randomness source that characterizes the statistical quality of their results. Consequently, we need highly reliable Random Number Generators (RNGs) to feed such applications. Recent developments try to shrink the computation time by using more and more General Purpose Graphics Processing Units (GP-GPUs) to speed-up stochastic simulations. Such devices […]
Nov, 19
SGPU 2: a runtime system for using large applications on clusters of hybrid nodes
In this article, we consider hybrid architectures that consist of standard CPU cores associated with accelerators (such as GPUs). These architectures are increasingly employed in large computing centers. We develop a strategy designed to deal with hybrid computing architectures from the computing performance and programmability points of view. We focus on hybrid computing clusters that […]
Nov, 19
Predictive Modeling and Analysis of OP2 on Distributed Memory GPU Clusters
OP2 is an "active" library framework for the development and solution of unstructured mesh-based applications. It aims to decouple the scientific specification of an application from its parallel implementation to achieve code longevity and near-optimal performance through re-targeting the backend to different multi-core/many-core hardware. This paper presents a summary of a predictive performance analysis and […]
Nov, 19
Teaching graphics processing and architecture using a hardware prototyping approach
Since its introduction over two decades ago, graphics hardware has continued to evolve to improve rendering performance and increase programmability. While most undergraduate courses in computer graphics focus on rendering algorithms and programming APIs, we have recently created an undergraduate senior elective course that focuses on graphics processing and architecture, with a strong emphasis on […]
Nov, 19
StreamMR: An Optimized MapReduce Framework for AMD GPUs
MapReduce is a programming model from Google that facilitates parallel processing on a cluster of thousands of commodity computers. The success of MapReduce in cluster environments has motivated several studies of implementing MapReduce on a graphics processing unit (GPU), but generally focusing on the NVIDIA GPU. Our investigation reveals that the design and mapping of […]
Nov, 18
Design and Implementation of a PTX Emulation Library
Intel co-founder Gordon E. Moore observed in 1965 that transistor density, the number of transistors that could be placed in an integrated circuit per square inch, increased exponentially, doubling roughly every two years. This would be later known as Moore’s Law, correctly predicting the trend that governed computing hardware manufacturing for the late 20th century. […]
Nov, 18
Particle-based Visualization of Large Cosmological Datasets
Large quantities of simulated cosmological particlebased data cause considerable problems when it comes to real-time visualization. This paper considers an out-ofcore approach for solving visualization problems on a single-desktop workstation. The approach proposed in this paper consists of two phases: the data preprocessing and its visualization. During the preprocessing, the cosmological data is hierarchically organized […]
Nov, 18
Tapping the supercomputer under your desk: Solving dynamic equilibrium models with graphics processors
This paper shows how to build algorithms that use graphics processing units (GPUs) installed in most modern computers to solve dynamic equilibrium models in economics. In particular, we rely on the compute unified device architecture (CUDA) of NVIDIA GPUs. We illustrate the power of the approach by solving a simple real business cycle model with […]
Nov, 18
The MOPED framework: Object recognition and pose estimation for manipulation
We present MOPED, a framework for Multiple Object Pose Estimation and Detection that seamlessly integrates single-image and multi-image object recognition and pose estimation in one optimized, robust, and scalable framework. We address two main challenges in computer vision for robotics: robust performance in complex scenes, and low latency for real-time operation. We achieve robust performance […]
Nov, 18
Fast Gather-based Construction of Stereoscopic Images Using Reprojection
We developed a very fast reprojection technique to generate stereoscopic images from a 2D image with depth information. The technique is gather-based and therefore very fast on current graphics hardware. The depth information is sampled at a specific offset which provides the depth to reproject from the left or right camera to the center camera. […]
Nov, 18
Accelerating The Cloud with Heterogeneous Computing
Heterogeneous multiprocessors that combine multiple CPUs and GPUs on a single die are posed to become commonplace in the market. As seen recently from the high performance computing community, leveraging a GPU can yield performance increases of several orders of magnitude. We propose using GPU acceleration to greatly speed up cloud management tasks in VMMs. […]

