10955

Posts

Nov, 20

Multi-GPU Support on the Marrow Algorithmic Skeleton Framework

With the proliferation of general purpose GPUs, workload parallelization and datatransfer optimization became an increasing concern. The natural evolution from using a single GPU, is multiplying the amount of available processors, presenting new challenges, as tuning the workload decompositions and load balancing, when dealing with heterogeneous systems. Higher-level programming is a very important asset in […]
Nov, 20

HyPHI – task based hybrid execution C++ library for the Intel Xeon Phi coprocessor

The Intel Threading Building Blocks (TBB) C++ library introduced task parallelism to a wide audience of application developers. The library is easy to use and powerful, but it is limited to shared-memory machines. In this paper we present HyPHI, a novel library for the Intel Xeon Phi coprocessor for building applications which execute using a […]
Nov, 20

International Workshop on OpenCL, IWOCL 2014

The International Workshop on OpenCL (IWOCL) is an annual meeting of OpenCL users, researchers, developers and suppliers to share OpenCL best practise, and to promote the evolution and advancement of the OpenCL standard. The meeting is open to anyone who is interested in contributing to, and participating in the OpenCL community. IWOCL is the premier […]
Nov, 19

Real-time rendering of large surface-scanned range data natively on a GPU

This thesis presents research carried out for the visualisation of surface anatomy data stored as large range images such as those produced by stereo-photogrammetric, and other triangulation-based capture devices. As part of this research, I explored the use of points as a rendering primitive as opposed to polygons, and the use of range images as […]
Nov, 19

Adaptive implementation selection in the SkePU skeleton programming library

In earlier work, we have developed the SkePU skeleton programming library for modern multicore systems equipped with one or more programmable GPUs. The library internally provides four types of implementations (implementation variants) for each skeleton: serial C++, OpenMP, CUDA and OpenCL targeting either CPU or GPU execution respectively. Deciding which implementation would run faster for […]
Nov, 19

A study of the speed and the accuracy of the Boundary Element Method as applied to the computational simulation of biological organs

In this work, first a Fortran code is developed for three dimensional linear elastostatics using constant boundary elements; the code is based on a MATLAB code developed by the author earlier. Next, the code is parallelized using BLACS, MPI, and ScaLAPACK. Later, the parallelized code is used to demonstrate the usefulness of the Boundary Element […]
Nov, 19

Implementation of the twisted mass fermion operator in the QUDA library

We discuss an extension of the QUDA library for the Wilson twisted mass operator. A performance analysis is presented for both degenerate and non-degenerate flavor doublets. The degenerate twisted mass fermion operator runs at up to 190, 487 and 856 Gflops, for double, single and half precisions respectively on recent NVIDIA Kepler GPUs, while our […]
Nov, 19

An implicit multigrid solver for high-order compressible flow simulations on GPUs

The multigrid method has proved to be effective for a large class of numerical methods. In this study, a strategy based on Full Approximation Storage (FAS) scheme is implemented together with Full Multigrid Algorithm (FMG) to accelerate convergence of steady state solutions of the two-dimensional compressible Euler equations on Graphics Processing Unit (GPU). The Beam […]
Nov, 18

Neurokernel: An Open Scalable Software Framework for Emulation and Validation of Drosophila Brain Models on Multiple GPUs

The brain of the fruit fly Drosophila melanogaster is an extremely attractive model system for reverse engineering the emergent properties of neural circuits because it implements complex sensory-driven behaviors with a nervous system comprising a number of components that is five orders of magnitude smaller than those of mammals. A powerful toolkit of well-developed genetic […]
Nov, 18

Integrating Multi-GPU Execution in an OpenACC Compiler

GPUs have become promising computing devices in current and future computer systems due to its high performance, high energy efficiency, and low price. However, lack of high level GPU programming models hinders the wide spread of GPU applications. To resolve this issue, OpenACC is developed as the first industry standard of a directive-based GPU programming […]
Nov, 18

Specification and verification of GPGPU programs

Graphics Processing Units (GPUs) are increasingly used for general-purpose applications because of their low price, energy efficiency and enormous computing power. Considering the importance of GPU applications, it is vital that the behaviour of GPU programs can be specified and proven correct formally. This paper presents a logic to verify GPU kernels written in OpenCL, […]
Nov, 18

Probing the Statistical Validity of the Ductile-to-Brittle Transition in Metallic Nanowires Using GPU Computing

We perform a large-scale statistical analysis (> 2000 independent simulations) of the elongation and rupture of gold nanowires, probing the validity and scope of the recently proposed ductile-to-brittle transition that occurs with increasing nanowire length [Wu et. al., Nano Lett., 12, 910-914 (2012)]. To facilitate a high-throughput simulation approach, we implement the second-moment approximation to […]

Recent source codes

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org