6089

Posts

Oct, 21

Accelerating exotic option pricing and model calibration using GPUs

Pricing and risk analysis for today’s exotic structured equity products is computationally more and more demanding and time consuming. GPUs offer the possibility to significantly increase computing performance even at reduced costs. We applied this technology to replace a large amount of our CPU based computing grid by hybrid GPU/CPU pricing engines. One GPU based […]
Oct, 21

OpenMP for Accelerators

OpenMP [14] is the dominant programming model for shared-memory parallelism in C, C++ and Fortran due to its easy-to-use directive-based style, portability and broad support by compiler vendors. Compute-intensive application regions are increasingly being accelerated using devices such as GPUs and DSPs, and a programming model with similar characteristics is needed here. This paper presents […]
Oct, 21

Efficient Synchronization Primitives for GPUs

In this paper, we revisit the design of synchronization primitives—specifically barriers, mutexes, and semaphores—and how they apply to the GPU. Previous implementations are insufficient due to the discrepancies in hardware and programming model of the GPU and CPU. We create new implementations in CUDA and analyze the performance of spinning on the GPU, as well […]
Oct, 20

Automatic program analysis for data parallel kernels

It is widely known that GPUs have more computational power and expose a far greater level of parallelism than conventional CPUs. Despite their high potential, GPUs are not yet a popular choice in practice, mainly because of their high programming complexity. The complexity derives from two factors. First, the existing programming models are tied to […]
Oct, 20

GPU Accelerated X-Ray Image Enhancement

This paper presents an automated method for preparing digital X-rays for use by a procedural mesh generator. This process will facilitate the generation of a 3D polygon mesh depicting the bones contained within the X-ray image. The process of preparing the image involves identifying and retaining bone elements whilst removing any superfluous aspects contained within […]
Oct, 20

Enabling Computational Dynamics in Distributed Computing Environments Using a Heterogeneous Computing Template

This paper describes a software infrastructure made up of tools and libraries designed to assist developers in implementing computational dynamics applications running on heterogeneous and distributed computing environments. Together, these tools and libraries compose a so called Heterogeneous Computing Template (HCT). The underlying theme of the solution approach embraced by HCT is that of partitioning […]
Oct, 20

A high performance computing framework for physics-based modeling and simulation of military ground vehicles

This paper describes a software infrastructure made up of tools and libraries designed to assist developers in implementing computational dynamics applications running on heterogeneous and distributed computing environments. Together, these tools and libraries compose a so called Heterogeneous Computing Template (HCT). The heterogeneous and distributed computing hardware infrastructure is assumed herein to be made up […]
Oct, 20

Experimental B+-tree for GPU

The main intention of this work is to create a dictionary structure which could benefit from massive parallelism of threads when performing computation on all or a selected set of elements, while having an ability to search for and insert keys very quickly, yet preserving the order of elements. So far, no such structure dedicated […]
Oct, 20

PEPPHER: Efficient and Productive Usage of Hybrid Computing Systems

PEPPHER, a three-year European FP7 project, addresses efficient utilization of hybrid (heterogeneous) computer systems consisting of multicore CPUs with GPU-type accelerators. This article outlines the PEPPHER performance-aware component model, performance prediction means, runtime system, and other aspects of the project. A larger example demonstrates performance portability with the PEPPHER approach across hybrid systems with one […]
Oct, 20

FAMOUS, faster: using parallel computing techniques to accelerate the FAMOUS/HadCM3 climate model with a focus on the radiative transfer algorithm

We have optimised the atmospheric radiation algorithm of the FAMOUS climate model on several hardware platforms. The optimisation involved translating the Fortran code to C and restructuring the algorithm around the computation of a single air column. A task queue and a thread pool are used to distribute the computation to several processors. Finally, four […]
Oct, 20

Explicit platform descriptions for heterogeneous many-core architectures

Heterogeneous many-core architectures offer a way to cope with energy consumption limitations of various computing systems from small mobile devices to large data-centers. However, programmers typically must consider a large diversity of architectural information to develop efficient software. In this paper we present our ongoing work towards a Platform Description Language (PDL) that enables to […]
Oct, 20

On the Use of an Algebraic Language Interface for Waveform Definition

We discuss implementation aspects of a software-defined radio system that allows the user to define waveforms using an algebraic language interface, currently as an extension to C++. Current software-defined radio systems provide waveform definitions through a combination of a graphical interface, markup language, interpreted script, and compiled code. No matter which methods are used, the […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: