6392

Posts

Nov, 18

Design and Implementation of a PTX Emulation Library

Intel co-founder Gordon E. Moore observed in 1965 that transistor density, the number of transistors that could be placed in an integrated circuit per square inch, increased exponentially, doubling roughly every two years. This would be later known as Moore’s Law, correctly predicting the trend that governed computing hardware manufacturing for the late 20th century. […]
Nov, 18

Particle-based Visualization of Large Cosmological Datasets

Large quantities of simulated cosmological particlebased data cause considerable problems when it comes to real-time visualization. This paper considers an out-ofcore approach for solving visualization problems on a single-desktop workstation. The approach proposed in this paper consists of two phases: the data preprocessing and its visualization. During the preprocessing, the cosmological data is hierarchically organized […]
Nov, 18

Tapping the supercomputer under your desk: Solving dynamic equilibrium models with graphics processors

This paper shows how to build algorithms that use graphics processing units (GPUs) installed in most modern computers to solve dynamic equilibrium models in economics. In particular, we rely on the compute unified device architecture (CUDA) of NVIDIA GPUs. We illustrate the power of the approach by solving a simple real business cycle model with […]
Nov, 18

The MOPED framework: Object recognition and pose estimation for manipulation

We present MOPED, a framework for Multiple Object Pose Estimation and Detection that seamlessly integrates single-image and multi-image object recognition and pose estimation in one optimized, robust, and scalable framework. We address two main challenges in computer vision for robotics: robust performance in complex scenes, and low latency for real-time operation. We achieve robust performance […]
Nov, 18

Fast Gather-based Construction of Stereoscopic Images Using Reprojection

We developed a very fast reprojection technique to generate stereoscopic images from a 2D image with depth information. The technique is gather-based and therefore very fast on current graphics hardware. The depth information is sampled at a specific offset which provides the depth to reproject from the left or right camera to the center camera. […]
Nov, 18

Accelerating The Cloud with Heterogeneous Computing

Heterogeneous multiprocessors that combine multiple CPUs and GPUs on a single die are posed to become commonplace in the market. As seen recently from the high performance computing community, leveraging a GPU can yield performance increases of several orders of magnitude. We propose using GPU acceleration to greatly speed up cloud management tasks in VMMs. […]
Nov, 18

Auto-tunable GPU BLAS

OpenCL is fast becoming the preferred framework used to make programs for heterogeneous platforms consisting of at least one CPU and one or more accelerators. The GPU being readily available in almost all computers, it is the most common accelerator in use.Good libraries are important to reduce development time and to make particular development environments, […]
Nov, 18

The Multi2Sim Simulation Framework: A CPU-GPU Model for Heterogeneous Computing

Multi2Sim is a simulation framework for heterogeneous computing, including models for superscalar, multithreaded, multicore, and graphics processors. Multi2Sim is an application-only simulator, which allows one or more applications to be run on top of it without booting a guest operating system first. In this chapter, an introduction to Multi2Sim is presented, and it is shown […]
Nov, 18

Optimizing the multipole-to-local operator in the fast multipole method for graphical processing units

This paper presents a number of algorithms to run the fast multipole method (FMM) on NVIDIA CUDA-capable graphical processing units (GPUs) (Nvidia Corporation, Sta. Clara, CA, USA). The FMM is a class of methods to compute pairwise interactions between N particles for a given error tolerance and with computational cost of O(N). The methods described […]
Nov, 18

Neon: A Domain-Specific Programming Language for Image Processing

Neon is a high-level domain-specific programming language for writing efficient image processing programs which can run on either the CPU or the GPU. End users write Neon programs in a C# programming environment. When the Neon program is executed, our optimizing code generator outputs human-readable source files for either the CPU or GPU. These source […]
Nov, 17

Dax Toolkit: A Proposed Framework for Data Analysis and Visualization at Extreme Scale

Experts agree that the exascale machine will comprise processors that contain many cores, which in turn will necessitate a much higher degree of concurrency. Software will require a minimum of a 1,000 times more concurrency. Most parallel analysis and visualization algorithms today work by partitioning data and running mostly serial algorithms concurrently on each data […]
Nov, 17

Compilation for Heterogeneous Computing: Automating Analyses, Transformations and Decisions

Hardware accelerators, such as fpga boards or gpu, are an interesting alternative or a valuable complement to classic multi-core processors for computational-intensive software. However it proves to be both costly and difficult to use legacy applications with these new heterogeneous targets. In particular, existing compilers are generally targeted toward code generation for sequential processors and […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: