11138

Posts

Dec, 19

Heterogeneous Programming with Single Operation Multiple Data

Heterogeneity is omnipresent in today’s commodity computational systems, which comprise at least one multi-core Central Processing Unit (CPU) and one Graphics Processing Unit (GPU). Nonetheless, all this computing power is not being exploited in mainstream computing, as the programming of these systems entails many details of the underlying architecture and of its distinct execution models. […]
Dec, 18

Tesla vs. Xeon Phi vs. Radeon A Compiler Writer’s Perspective

Today, most CPU+Accelerator systems incorporate NVIDIA GPUs. Intel Xeon Phi and the continued evolution of AMD Radeon GPUs make it likely we will soon see, and want to program, a wider variety of CPU+Accelerator systems. PGI already supports NVIDIA GPUs, and is working to add support for Xeon Phi and AMD Radeon. Here we explore […]
Dec, 18

Fast Image Alignment with Fourier Moment Matching on GPU

In this paper, we develop a fast and accurate image alignment system which can be applied to image sequences in real time. The proposed image alignment system consists of two main components: the development of Fourier moment matching system and the implementation of the system in GPU. The Fourier moment matching is to efficiently find […]
Dec, 18

Efficient Multi-GPU Computation of All-Pairs Shortest Paths

We describe a new algorithm for solving the all-pairs shortest-path (APSP) problem for planar graphs and graphs with small separators that exploits the massive on-chip parallelism available in today’s Graphics Processing Units (GPUs). Our algorithm, based on the Floyd-Warshall algorithm, has near optimal complexity in terms of the total number of operations, while its matrix-based […]
Dec, 18

A comparative analysis of the performance and deployment overhead of parallelized Finite Difference Time Domain (FDTD) algorithms on a selection of high performance multiprocessor computing systems

The parallel FDTD method as used in computational electromagnetics is implemented on a variety of different high performance computing platforms. These parallel FDTD implementations have regularly been compared in terms of performance or purchase cost, but very little systematic consideration has been given to how much effort has been used to create the parallel FDTD […]
Dec, 18

GPU Accelerated Semiclassical Initial Value Representation Molecular Dynamics

This paper presents a graphics processing units (GPUs) implementation of the semiclassical initial value representation (SC-IVR) propagator for vibrational molecular spectroscopy calculations. The time-averaging formulation of the SC-IVR for power spectrum calculations is employed. Details about the CUDA implementation of the semiclassical code are provided. 4 molecules with an increasing number of atoms are considered […]
Dec, 17

Data Structures for Task-based Priority Scheduling

Many task-parallel applications can benefit from attempting to execute tasks in a specific order, as for instance indicated by priorities associated with the tasks. We present three lock-free data structures for priority scheduling with different trade-offs on scalability and ordering guarantees. First we propose a basic extension to work-stealing that provides good scalability, but cannot […]
Dec, 17

Development methodologies for GPU and cluster of GPUs

This chapter proposes to draw several development methodologies to obtain efficient codes in classical scientific applications. Those methodologies are based on the feedback from several research works involving GPUs, either alone in a single machine or in a cluster of machines. Indeed, our past collaborations with industries have allowed us to point out that in […]
Dec, 17

OpenCL Accelerated Multi-GPU Cone-Beam Reconstruction

Volume reconstruction in cone-beam CT is a computationally demanding task. Since recent years, the reconstruction is accelerated by utilizing Graphics Processing Units (GPUs). Frameworks for General Purpose Computations on GPUs are proven tool to access the resources of graphics cards. WIth the Open Computing Language (OpenCL) the first open standard for cross-vendor and cross-platform programming […]
Dec, 17

High Performance Computing Image Analysis for Radiotherapy Planning

The Edinburgh Cancer Centre at the Western General Hospital in Edinburgh is doing research on image analysis for predicting lung fibrosis induced by radiation as part of a treatment plan. They are developing a MATLAB code to analyse three dimensional Computed tomography (CT) images of patients but, because a standard three dimensional CT image is […]
Dec, 17

Parallel Firewalls on General-Purpose Graphics Processing Units

Firewalls use a rule database to decide which packets will be allowed from one network onto another thereby implementing a security policy. In high-speed networks as the inter-arrival rate of packets decreases, the latency incurred by a firewall increases. In such a scenario, a single firewall become a bottleneck and reduces the overall throughput of […]
Dec, 16

Ray-Traced Collision Detection: Interpenetration Control and Multi-GPU Performance

We proposed [LGA13] an iterative ray-traced collision detection algorithm (IRTCD) that exploits spatial and temporal coherency and proved to be computationally efficient but at the price of some geometrical approximations that allow more interpenetration than needed. In this paper, we present two methods to efficiently control and reduce the interpenetration without noticeable computation overhead. The […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: