high performance computing on graphics processing units: hgpu.org

Posts

Oct, 20

Automatic program analysis for data parallel kernels

It is widely known that GPUs have more computational power and expose a far greater level of parallelism than conventional CPUs. Despite their high potential, GPUs are not yet a popular choice in practice, mainly because of their high programming complexity. The complexity derives from two factors. First, the existing programming models are tied to […]

CUDA

Oct, 20

GPU Accelerated X-Ray Image Enhancement

This paper presents an automated method for preparing digital X-rays for use by a procedural mesh generator. This process will facilitate the generation of a 3D polygon mesh depicting the bones contained within the X-ray image. The process of preparing the image involves identifying and retaining bone elements whilst removing any superfluous aspects contained within […]

Oct, 20

Enabling Computational Dynamics in Distributed Computing Environments Using a Heterogeneous Computing Template

This paper describes a software infrastructure made up of tools and libraries designed to assist developers in implementing computational dynamics applications running on heterogeneous and distributed computing environments. Together, these tools and libraries compose a so called Heterogeneous Computing Template (HCT). The underlying theme of the solution approach embraced by HCT is that of partitioning […]

CUDA

Oct, 20

A high performance computing framework for physics-based modeling and simulation of military ground vehicles

This paper describes a software infrastructure made up of tools and libraries designed to assist developers in implementing computational dynamics applications running on heterogeneous and distributed computing environments. Together, these tools and libraries compose a so called Heterogeneous Computing Template (HCT). The heterogeneous and distributed computing hardware infrastructure is assumed herein to be made up […]

CUDA

Oct, 20

Experimental B+-tree for GPU

The main intention of this work is to create a dictionary structure which could benefit from massive parallelism of threads when performing computation on all or a selected set of elements, while having an ability to search for and insert keys very quickly, yet preserving the order of elements. So far, no such structure dedicated […]

CUDA

Oct, 20

PEPPHER: Efficient and Productive Usage of Hybrid Computing Systems

PEPPHER, a three-year European FP7 project, addresses efficient utilization of hybrid (heterogeneous) computer systems consisting of multicore CPUs with GPU-type accelerators. This article outlines the PEPPHER performance-aware component model, performance prediction means, runtime system, and other aspects of the project. A larger example demonstrates performance portability with the PEPPHER approach across hybrid systems with one […]

OpenCL

Oct, 20

FAMOUS, faster: using parallel computing techniques to accelerate the FAMOUS/HadCM3 climate model with a focus on the radiative transfer algorithm

We have optimised the atmospheric radiation algorithm of the FAMOUS climate model on several hardware platforms. The optimisation involved translating the Fortran code to C and restructuring the algorithm around the computation of a single air column. A task queue and a thread pool are used to distribute the computation to several processors. Finally, four […]

OpenCL

Oct, 20

Explicit platform descriptions for heterogeneous many-core architectures

Heterogeneous many-core architectures offer a way to cope with energy consumption limitations of various computing systems from small mobile devices to large data-centers. However, programmers typically must consider a large diversity of architectural information to develop efficient software. In this paper we present our ongoing work towards a Platform Description Language (PDL) that enables to […]

OpenCL

Oct, 20

On the Use of an Algebraic Language Interface for Waveform Definition

We discuss implementation aspects of a software-defined radio system that allows the user to define waveforms using an algebraic language interface, currently as an extension to C++. Current software-defined radio systems provide waveform definitions through a combination of a graphical interface, markup language, interpreted script, and compiled code. No matter which methods are used, the […]

Oct, 20

A prototyping environment for high performance reconfigurable computing

In the face of power wall and high performance requirements, designers of hardware architectures are directed more and more towards reconfigurable computing with the usage of heterogeneous CPU/FPGA systems. In such architectures, multi-core processors come with high computation rates while the reconfigurable logic offers high performance per watt and adaptability to the application constraints. However, […]

Oct, 19

An Efficient Stream Buffer Mechanism for Dataflow Execution on Heterogeneous Platforms with GPUs

The move towards heterogeneous parallel computing is underway as witnessed by the emergence of novel computing platforms combining architecturally diverse components such as CPUs, GPUs and special function units. We approach mapping of streaming applications onto heterogeneous architectures using a Process Network (PN) model of computation. In this paper, we present an approach for exploiting […]

CUDA

Oct, 19

The Potential for a GPU-Like Overlay Architecture for FPGAs

We propose a soft processor programming model and architecture inspired by graphics processing units (GPUs) that are well-matched to the strengths of FPGAs, namely, highly parallel and pipelinable computation. In particular, our soft processor architecture exploits multithreading, vector operations, and predication to supply a floating-point pipeline of 64 stages via hardware support for up to […]