6396

Posts

Nov, 19

SGPU 2: a runtime system for using large applications on clusters of hybrid nodes

In this article, we consider hybrid architectures that consist of standard CPU cores associated with accelerators (such as GPUs). These architectures are increasingly employed in large computing centers. We develop a strategy designed to deal with hybrid computing architectures from the computing performance and programmability points of view. We focus on hybrid computing clusters that […]
Nov, 19

Predictive Modeling and Analysis of OP2 on Distributed Memory GPU Clusters

OP2 is an "active" library framework for the development and solution of unstructured mesh-based applications. It aims to decouple the scientific specification of an application from its parallel implementation to achieve code longevity and near-optimal performance through re-targeting the backend to different multi-core/many-core hardware. This paper presents a summary of a predictive performance analysis and […]
Nov, 19

Teaching graphics processing and architecture using a hardware prototyping approach

Since its introduction over two decades ago, graphics hardware has continued to evolve to improve rendering performance and increase programmability. While most undergraduate courses in computer graphics focus on rendering algorithms and programming APIs, we have recently created an undergraduate senior elective course that focuses on graphics processing and architecture, with a strong emphasis on […]
Nov, 19

StreamMR: An Optimized MapReduce Framework for AMD GPUs

MapReduce is a programming model from Google that facilitates parallel processing on a cluster of thousands of commodity computers. The success of MapReduce in cluster environments has motivated several studies of implementing MapReduce on a graphics processing unit (GPU), but generally focusing on the NVIDIA GPU. Our investigation reveals that the design and mapping of […]
Nov, 18

Design and Implementation of a PTX Emulation Library

Intel co-founder Gordon E. Moore observed in 1965 that transistor density, the number of transistors that could be placed in an integrated circuit per square inch, increased exponentially, doubling roughly every two years. This would be later known as Moore’s Law, correctly predicting the trend that governed computing hardware manufacturing for the late 20th century. […]
Nov, 18

Particle-based Visualization of Large Cosmological Datasets

Large quantities of simulated cosmological particlebased data cause considerable problems when it comes to real-time visualization. This paper considers an out-ofcore approach for solving visualization problems on a single-desktop workstation. The approach proposed in this paper consists of two phases: the data preprocessing and its visualization. During the preprocessing, the cosmological data is hierarchically organized […]
Nov, 18

Tapping the supercomputer under your desk: Solving dynamic equilibrium models with graphics processors

This paper shows how to build algorithms that use graphics processing units (GPUs) installed in most modern computers to solve dynamic equilibrium models in economics. In particular, we rely on the compute unified device architecture (CUDA) of NVIDIA GPUs. We illustrate the power of the approach by solving a simple real business cycle model with […]
Nov, 18

The MOPED framework: Object recognition and pose estimation for manipulation

We present MOPED, a framework for Multiple Object Pose Estimation and Detection that seamlessly integrates single-image and multi-image object recognition and pose estimation in one optimized, robust, and scalable framework. We address two main challenges in computer vision for robotics: robust performance in complex scenes, and low latency for real-time operation. We achieve robust performance […]
Nov, 18

Fast Gather-based Construction of Stereoscopic Images Using Reprojection

We developed a very fast reprojection technique to generate stereoscopic images from a 2D image with depth information. The technique is gather-based and therefore very fast on current graphics hardware. The depth information is sampled at a specific offset which provides the depth to reproject from the left or right camera to the center camera. […]
Nov, 18

Accelerating The Cloud with Heterogeneous Computing

Heterogeneous multiprocessors that combine multiple CPUs and GPUs on a single die are posed to become commonplace in the market. As seen recently from the high performance computing community, leveraging a GPU can yield performance increases of several orders of magnitude. We propose using GPU acceleration to greatly speed up cloud management tasks in VMMs. […]
Nov, 18

Auto-tunable GPU BLAS

OpenCL is fast becoming the preferred framework used to make programs for heterogeneous platforms consisting of at least one CPU and one or more accelerators. The GPU being readily available in almost all computers, it is the most common accelerator in use.Good libraries are important to reduce development time and to make particular development environments, […]
Nov, 18

The Multi2Sim Simulation Framework: A CPU-GPU Model for Heterogeneous Computing

Multi2Sim is a simulation framework for heterogeneous computing, including models for superscalar, multithreaded, multicore, and graphics processors. Multi2Sim is an application-only simulator, which allows one or more applications to be run on top of it without booting a guest operating system first. In this chapter, an introduction to Multi2Sim is presented, and it is shown […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: