11147

Posts

Dec, 20

Warps and Atomics: Beyond Barrier Synchronization in the Verification of GPU Kernels

We describe the design and implementation of methods to support reasoning about data races in GPU kernels where constructs other than the standard barrier primitive are used for synchronization. At one extreme we consider kernels that exploit implicit, coarse-grained synchronization between threads in the same warp, a feature provided by many architectures. At the other […]
Dec, 20

Pannotia: Understanding Irregular GPGPU Graph Applications

GPUs have become popular recently to accelerate general-purpose data-parallel applications. However, most existing work has focused on GPU-friendly applications with regular data structures and access patterns. While a few prior studies have shown that some irregular workloads can also achieve speedups on GPUs, this domain has not been investigated thoroughly. Graph applications are one such […]
Dec, 20

Adapting the GA Approach to Solve Traveling Salesman Problems on CUDA Architecture

The vehicle routing problem (VRP) is one of the most challenging combinatorial optimization problems, which has been studied for several decades. The number of solutions for VRP increases exponentially while the number of points, which must be visited increases. There are 3.0×10^64 different solutions for 50 visiting points in a direct solution, and it is […]
Dec, 20

XSD: Accelerating MapReduce by Harnessing the GPU inside an SSD

Considerable research has been conducted recently on near-data processing techniques as real-world tasks increasingly involve large-scale and high-dimensional data sets. The advent of solid-state drives (SSDs) has spurred further research because of their processing capability and high internal bandwidth. However, the data processing capability of conventional SSD systems have not been impressive. In particular, they […]
Dec, 20

Towards global composition of performance-aware components for GPU-based systems

An important program optimization especially for heterogeneous parallel systems is performance-aware implementation selection which is (static or dynamic) selection between multiple implementation variants for the same computation, depending on the current execution context (such as currently available resources or performance affecting parameter values). Doing it for multiple component calls inside a program while considering interferences […]
Dec, 19

Optimizing GPU to GPU Communication on Cray XK7

When developing an application for Cray XK7 systems, optimization of compute kernels is only a small part of maximizing scaling and performance. Programmers must consider the effect of the GPU’s distinct address space and the PCIe bus on application scalability. Without such considerations applications rapidly become limited by transfers to and from the GPU and […]
Dec, 19

Experiences Porting a Molecular Dynamics Code to GPUs on a Cray XK7

GPU computing has rapidly gained popularity as a way to achieve higher performance of many scientific applications. In this paper we report on the experience of porting a hybrid MPI+OpenMP molecular dynamics code to a GPU enabled CrayXK7 to make a hybrid MPI+GPU code. The target machine, Indiana University’s Big Red II, consists of a […]
Dec, 19

A Data-Driven Model for Anisotropic Heterogeneous Subsurface Scattering

We present a new BSSRDF representation for editing measured anisotropic heterogeneous translucent materials, such as veined marble, jade, artificial stones with lighting-blocking discontinuities. Our work is inspired by the SubEdit representation introduced in [1]. Our main contribution is to improve the accuracy of the approximation while keeping it compact and efficient for editing.We decompose the […]
Dec, 19

A Two-stage Query by Singing/Humming System on GPU

This paper proposes the use of GPU (graphic processing unit) to implementing a two-stage comparison method for a QBSH (query by singing/humming) system. The system can take a user’s singing or humming and retrieve the top-10 most likely candidates from a database of 8431 songs. In order to speed up the comparison, we apply linear […]
Dec, 19

Heterogeneous Programming with Single Operation Multiple Data

Heterogeneity is omnipresent in today’s commodity computational systems, which comprise at least one multi-core Central Processing Unit (CPU) and one Graphics Processing Unit (GPU). Nonetheless, all this computing power is not being exploited in mainstream computing, as the programming of these systems entails many details of the underlying architecture and of its distinct execution models. […]
Dec, 18

Tesla vs. Xeon Phi vs. Radeon A Compiler Writer’s Perspective

Today, most CPU+Accelerator systems incorporate NVIDIA GPUs. Intel Xeon Phi and the continued evolution of AMD Radeon GPUs make it likely we will soon see, and want to program, a wider variety of CPU+Accelerator systems. PGI already supports NVIDIA GPUs, and is working to add support for Xeon Phi and AMD Radeon. Here we explore […]
Dec, 18

Fast Image Alignment with Fourier Moment Matching on GPU

In this paper, we develop a fast and accurate image alignment system which can be applied to image sequences in real time. The proposed image alignment system consists of two main components: the development of Fourier moment matching system and the implementation of the system in GPU. The Fourier moment matching is to efficiently find […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: