high performance computing on graphics processing units: hgpu.org

Posts

Feb, 25

Achieving a single compute device image in OpenCL for multiple GPUs

In this paper, we propose an OpenCL framework that combines multiple GPUs and treats them as a single compute device. Providing a single virtual compute device image to the user makes an OpenCL application written for a single GPU portable to the platform that has multiple GPU devices. It also makes the application exploit full […]

CUDA

•

OpenCL

Feb, 17

OpenCL Evaluation for Numerical Linear Algebra Library Development

With the help of of CUDA [7], [6], many applications improved their performance by using GPUs. In our project called Matrix Algebra on GPU and Multicore Architectures (MAGMA) [10], we mainly focus on dense linear algebra routines similar to those from LAPACK [1]. Other than CUDA, there exist other frameworks that allow platformindependent programming for […]

OpenCL

Feb, 16

An experimental study on performance portability of OpenCL kernels

Accelerator processors allow energy-efficient computation at high performance, especially for computationintensive applications. There exists a plethora of different accelerator architectures, such as GPUs and the Cell Broadband Engine. Each accelerator has its own programming language, but the recently introduced OpenCL language unifies accelerator programming languages. Hereby, OpenCL achieves functional protability, allowing to reduce the development […]

OpenCL

Jan, 22

Swan: A tool for porting CUDA programs to OpenCL

The use of modern, high-performance graphical processing units (GPUs) for acceleration of scientific computation has been widely reported. The majority of this work has used the CUDA programming model supported exclusively by GPUs manufactured by NVIDIA. An industry standardisation effort has recently produced the OpenCL specification for GPU programming. This offers the benefits of hardware-independence […]

CUDA

•

OpenCL

Jan, 16

An OpenCL framework for heterogeneous multicores with local memory

In this paper, we present the design and implementation of an Open Computing Language (OpenCL) framework that targets heterogeneous accelerator multicore architectures with local memory. The architecture consists of a general-purpose processor core and multiple accelerator cores that typically do not have any cache. Each accelerator core, instead, has a small internal local memory. Our […]

OpenCL

Dec, 27

OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems

The OpenCL standard offers a common API for program execution on systems composed of different types of computational devices such as multicore CPUs, GPUs, or other accelerators.

OpenCL

Nov, 12

Fast calculation of computer-generated-hologram on AMD HD5000 series GPU and OpenCL

In this paper, we report fast calculation of a computer-generated-hologram using a new architecture of the HD5000 series GPU (RV870) made by AMD and its new software development environment, OpenCL. Using a RV870 GPU and OpenCL, we can calculate 1,920 * 1,024 resolution of a CGH from a 3D object consisting of 1,024 points in […]

OpenCL

Nov, 9

An Exploration of OpenCL for a Numerical Relativity Application

Currently there is considerable interest in making use of many-core processor architectures, such as Nvidia and AMD graphics processing units (GPUs) for scientific computing. In this work we explore the use of the Open Computing Language (OpenCL) for a typical Numerical Relativity application: a time-domain Teukolsky equation solver (a linear, hyperbolic, partial differential equation solver […]

OpenCL

Nov, 9

Numerical modeling of gravitational wave sources accelerated by OpenCL

In this work, we make use of the OpenCL framework to accelerate an EMRI modeling application using the hardware accelerators – Cell BE and Tesla CUDA GPU. We describe these compute technologies and our parallelization approach in detail, present our performance results, and then compare them with those from our previous implementations based on the […]

OpenCL

Nov, 5

A Performance Comparison of CUDA and OpenCL

CUDA and OpenCL offer two different interfaces for programming GPUs. OpenCL is an open standard that can be used to program CPUs, GPUs, and other devices from different vendors, while CUDA is specific to NVIDIA GPUs. Although OpenCL promises a portable language for GPU programming, its generality may entail a performance penalty. In this paper, […]

CUDA

•

OpenCL

Oct, 28

GPU packet classification using OpenCL: a consideration of viable classification methods

Packet analysis is an important aspect of network security, which typically relies on a flexible packet filtering system to extrapolate important packet information from each processed packet. Packet analysis is a computationally intensive, highly parallelisable task, and as such, classification of large packet sets, such as those collected by a network telescope, can require significant […]

OpenCL

Mar, 3

Parallel programming in mobile devices with FancyJCL

Mobile devices and handheld systems, such as the smartphones and tablets universally extended, are becoming increasingly powerful. Their basic hardware configuration is usually state-of-the-art heterogeneous architectures consisting of multi-core processors and some kind of accelerator such as GPUs or DSPs. Specific code adapted to the architecture is mandatory if high-performance computation is required and low-level […]

OpenCL