high performance computing on graphics processing units: hgpu.org

Posts

Nov, 20

HyPHI – task based hybrid execution C++ library for the Intel Xeon Phi coprocessor

The Intel Threading Building Blocks (TBB) C++ library introduced task parallelism to a wide audience of application developers. The library is easy to use and powerful, but it is limited to shared-memory machines. In this paper we present HyPHI, a novel library for the Intel Xeon Phi coprocessor for building applications which execute using a […]

Nov, 20

International Workshop on OpenCL, IWOCL 2014

The International Workshop on OpenCL (IWOCL) is an annual meeting of OpenCL users, researchers, developers and suppliers to share OpenCL best practise, and to promote the evolution and advancement of the OpenCL standard. The meeting is open to anyone who is interested in contributing to, and participating in the OpenCL community. IWOCL is the premier […]

Nov, 19

Real-time rendering of large surface-scanned range data natively on a GPU

This thesis presents research carried out for the visualisation of surface anatomy data stored as large range images such as those produced by stereo-photogrammetric, and other triangulation-based capture devices. As part of this research, I explored the use of points as a rendering primitive as opposed to polygons, and the use of range images as […]

OpenGL

Nov, 19

Adaptive implementation selection in the SkePU skeleton programming library

In earlier work, we have developed the SkePU skeleton programming library for modern multicore systems equipped with one or more programmable GPUs. The library internally provides four types of implementations (implementation variants) for each skeleton: serial C++, OpenMP, CUDA and OpenCL targeting either CPU or GPU execution respectively. Deciding which implementation would run faster for […]

CUDA

•

OpenCL

Nov, 19

A study of the speed and the accuracy of the Boundary Element Method as applied to the computational simulation of biological organs

In this work, first a Fortran code is developed for three dimensional linear elastostatics using constant boundary elements; the code is based on a MATLAB code developed by the author earlier. Next, the code is parallelized using BLACS, MPI, and ScaLAPACK. Later, the parallelized code is used to demonstrate the usefulness of the Boundary Element […]

CUDA

Nov, 19

Implementation of the twisted mass fermion operator in the QUDA library

We discuss an extension of the QUDA library for the Wilson twisted mass operator. A performance analysis is presented for both degenerate and non-degenerate flavor doublets. The degenerate twisted mass fermion operator runs at up to 190, 487 and 856 Gflops, for double, single and half precisions respectively on recent NVIDIA Kepler GPUs, while our […]

CUDA

Nov, 19

An implicit multigrid solver for high-order compressible flow simulations on GPUs

The multigrid method has proved to be effective for a large class of numerical methods. In this study, a strategy based on Full Approximation Storage (FAS) scheme is implemented together with Full Multigrid Algorithm (FMG) to accelerate convergence of steady state solutions of the two-dimensional compressible Euler equations on Graphics Processing Unit (GPU). The Beam […]

CUDA

Nov, 18

Neurokernel: An Open Scalable Software Framework for Emulation and Validation of Drosophila Brain Models on Multiple GPUs

The brain of the fruit fly Drosophila melanogaster is an extremely attractive model system for reverse engineering the emergent properties of neural circuits because it implements complex sensory-driven behaviors with a nervous system comprising a number of components that is five orders of magnitude smaller than those of mammals. A powerful toolkit of well-developed genetic […]

CUDA

Nov, 18

Integrating Multi-GPU Execution in an OpenACC Compiler

GPUs have become promising computing devices in current and future computer systems due to its high performance, high energy efficiency, and low price. However, lack of high level GPU programming models hinders the wide spread of GPU applications. To resolve this issue, OpenACC is developed as the first industry standard of a directive-based GPU programming […]

CUDA

Nov, 18

Specification and verification of GPGPU programs

Graphics Processing Units (GPUs) are increasingly used for general-purpose applications because of their low price, energy efficiency and enormous computing power. Considering the importance of GPU applications, it is vital that the behaviour of GPU programs can be specified and proven correct formally. This paper presents a logic to verify GPU kernels written in OpenCL, […]

OpenCL

Nov, 18

Probing the Statistical Validity of the Ductile-to-Brittle Transition in Metallic Nanowires Using GPU Computing

We perform a large-scale statistical analysis (> 2000 independent simulations) of the elongation and rupture of gold nanowires, probing the validity and scope of the recently proposed ductile-to-brittle transition that occurs with increasing nanowire length [Wu et. al., Nano Lett., 12, 910-914 (2012)]. To facilitate a high-throughput simulation approach, we implement the second-moment approximation to […]

CUDA

Nov, 18

Performance and Power Comparisons Between Fermi and Cypress GPUs

In recent years, modern graphics processing units have been widely adopted in high performance computing areas to solve large scale computation problems. The leading GPU manufacturers Nvidia and AMD have introduced series of products to the market. While sharing many similar design concepts, GPUs from these two manufacturers differ in several aspects on processor cores […]

CUDA

•

OpenCL