Posts
Dec, 19
Experiences Porting a Molecular Dynamics Code to GPUs on a Cray XK7
GPU computing has rapidly gained popularity as a way to achieve higher performance of many scientific applications. In this paper we report on the experience of porting a hybrid MPI+OpenMP molecular dynamics code to a GPU enabled CrayXK7 to make a hybrid MPI+GPU code. The target machine, Indiana University’s Big Red II, consists of a […]
Dec, 19
A Data-Driven Model for Anisotropic Heterogeneous Subsurface Scattering
We present a new BSSRDF representation for editing measured anisotropic heterogeneous translucent materials, such as veined marble, jade, artificial stones with lighting-blocking discontinuities. Our work is inspired by the SubEdit representation introduced in [1]. Our main contribution is to improve the accuracy of the approximation while keeping it compact and efficient for editing.We decompose the […]
Dec, 19
A Two-stage Query by Singing/Humming System on GPU
This paper proposes the use of GPU (graphic processing unit) to implementing a two-stage comparison method for a QBSH (query by singing/humming) system. The system can take a user’s singing or humming and retrieve the top-10 most likely candidates from a database of 8431 songs. In order to speed up the comparison, we apply linear […]
Dec, 19
Heterogeneous Programming with Single Operation Multiple Data
Heterogeneity is omnipresent in today’s commodity computational systems, which comprise at least one multi-core Central Processing Unit (CPU) and one Graphics Processing Unit (GPU). Nonetheless, all this computing power is not being exploited in mainstream computing, as the programming of these systems entails many details of the underlying architecture and of its distinct execution models. […]
Dec, 18
Tesla vs. Xeon Phi vs. Radeon A Compiler Writer’s Perspective
Today, most CPU+Accelerator systems incorporate NVIDIA GPUs. Intel Xeon Phi and the continued evolution of AMD Radeon GPUs make it likely we will soon see, and want to program, a wider variety of CPU+Accelerator systems. PGI already supports NVIDIA GPUs, and is working to add support for Xeon Phi and AMD Radeon. Here we explore […]
Dec, 18
Fast Image Alignment with Fourier Moment Matching on GPU
In this paper, we develop a fast and accurate image alignment system which can be applied to image sequences in real time. The proposed image alignment system consists of two main components: the development of Fourier moment matching system and the implementation of the system in GPU. The Fourier moment matching is to efficiently find […]
Dec, 18
Efficient Multi-GPU Computation of All-Pairs Shortest Paths
We describe a new algorithm for solving the all-pairs shortest-path (APSP) problem for planar graphs and graphs with small separators that exploits the massive on-chip parallelism available in today’s Graphics Processing Units (GPUs). Our algorithm, based on the Floyd-Warshall algorithm, has near optimal complexity in terms of the total number of operations, while its matrix-based […]
Dec, 18
A comparative analysis of the performance and deployment overhead of parallelized Finite Difference Time Domain (FDTD) algorithms on a selection of high performance multiprocessor computing systems
The parallel FDTD method as used in computational electromagnetics is implemented on a variety of different high performance computing platforms. These parallel FDTD implementations have regularly been compared in terms of performance or purchase cost, but very little systematic consideration has been given to how much effort has been used to create the parallel FDTD […]
Dec, 18
GPU Accelerated Semiclassical Initial Value Representation Molecular Dynamics
This paper presents a graphics processing units (GPUs) implementation of the semiclassical initial value representation (SC-IVR) propagator for vibrational molecular spectroscopy calculations. The time-averaging formulation of the SC-IVR for power spectrum calculations is employed. Details about the CUDA implementation of the semiclassical code are provided. 4 molecules with an increasing number of atoms are considered […]
Dec, 17
Data Structures for Task-based Priority Scheduling
Many task-parallel applications can benefit from attempting to execute tasks in a specific order, as for instance indicated by priorities associated with the tasks. We present three lock-free data structures for priority scheduling with different trade-offs on scalability and ordering guarantees. First we propose a basic extension to work-stealing that provides good scalability, but cannot […]
Dec, 17
Development methodologies for GPU and cluster of GPUs
This chapter proposes to draw several development methodologies to obtain efficient codes in classical scientific applications. Those methodologies are based on the feedback from several research works involving GPUs, either alone in a single machine or in a cluster of machines. Indeed, our past collaborations with industries have allowed us to point out that in […]
Dec, 17
OpenCL Accelerated Multi-GPU Cone-Beam Reconstruction
Volume reconstruction in cone-beam CT is a computationally demanding task. Since recent years, the reconstruction is accelerated by utilizing Graphics Processing Units (GPUs). Frameworks for General Purpose Computations on GPUs are proven tool to access the resources of graphics cards. WIth the Open Computing Language (OpenCL) the first open standard for cross-vendor and cross-platform programming […]