high performance computing on graphics processing units: hgpu.org

Posts

Jul, 1

Parallelizing the cellular potts model on GPU and multi-core CPU: An OpenCL cross-platform study

In this paper, we present the analysis and development of a cross-platform OpenCL parallelization of the Cellular Potts Model (CPM). In general, the evolution of the CPM is time-consuming. Using data-parallel programming model such as CUDA can accelerate the process, but it is highly dependent on the hardware type and manufacturer. Recently, OpenCL has attracted […]

OpenCL

Jul, 1

High-Level Programming Framework for Executing Streaming Applications on Heterogeneous OpenCL Platforms

As the computer industry is reaching more and more limits regarding processor speed and transistor size, they have to come up with complex new architectures and more efficient use of the available processing power. For application developers this can be a difficult task, because they have to be aware of low-level hardware properties and there […]

OpenCL

Jun, 26

On the Characterization of OpenCL Dwarfs on Fixed and Reconfigurable Platforms

The proliferation of heterogeneous computing platforms presents the parallel computing community with new challenges. One such challenge entails evaluating the efficacy of such parallel architectures and identifying the architectural innovations that ultimately benefit applications. To address this challenge, we need benchmarks that capture the execution patterns (i.e., dwarfs or motifs) of applications, both present and […]

OpenCL

Jun, 17

A Portable OpenCL Lattice Boltzmann Code for Multi- and Many-core Processor Architectures

The architecture of high performance computing systems is becoming more and more heterogeneous, as accelerators play an increasingly important role alongside traditional CPUs. Programming heterogeneous systems efficiently is a complex task, that often requires the use of specific programming environments. Programming frameworks supporting codes portable across different high performance architectures have recently appeared, but one […]

CUDA

•

OpenCL

Jun, 16

Grover: Looking for Performance Improvement by Disabling Local Memory Usage in OpenCL Kernels

Due to the diversity of processor architectures and application memory access patterns, the performance impact of using local memory in OpenCL kernels has become unpredictable. For example, enabling the use of local memory for an OpenCL kernel can be beneficial for the execution on a GPU, but can lead to performance losses when running on […]

OpenCL

Jun, 15

Toward OpenCL Automatic Multi-Device Support

To fully tap into the potential of today heterogeneous machines, offloading parts of an application on accelerators is no longer sufficient. The real challenge is to build systems where the application would permanently spread across the entire machine, that is, where parallel tasks would be dynamically scheduled over the full set of available processing units. […]

OpenCL

Jun, 14

Dynamic loop vectorization for executing OpenCL kernels on CPUs

Heterogeneous computing platforms are becoming increasingly important in supercomputing. Many systems now integrate CPUs and GPUs cooperating together on a single node. Much effort is invested in tuning GPU-kernels. However, it can be the case that some systems may not have GPUs or the GPUs are busy. Maintaining two versions of the same code for […]

OpenCL

Jun, 9

Efficient all-against-all protein similarity matrix computation using OpenCL

In this report we introduced CLSW, a fast GPU-based Smith-Waterman score-only-alignment calculator. While generally applicable for any protein alignment problem, it was designed specifically as a proof-of-concept application for SIMAP. Even if we had only two weeks to develop a fully functional, validated and optimized implementation and all related concepts, our results show that in […]

OpenCL

Jun, 9

GPU-Accelerated Dynamic Functional Connectivity Analysis for Functional MRI Data Using OpenCL

Intense computations in engineering and science, especially bioinformatics have been made practical by the recent advances in Graphical Processing Unit (GPU) computing technology. In this study, implementation and performance evaluations for a GPU-accelerated dynamic functional connectivity (DFC) analysis, which is an analysis method for investigating dynamic interactions among different brain networks, is presented. Open Computing […]

OpenCL

Jun, 9

3D Skeleton Extraction Method using Potential Field on OpenCL

For 3D skeleton extraction, the algorithm based on generalized potential fields, known as the outstandingly flexible and robust method, is suffering from seriously heavy computational burden. In this paper, we put forward a parallel algorithm based on OpenCL heterogeneous parallel framework, which can make full use of the great computing power provided by heterogeneous model […]

OpenCL

May, 30

Data Layout Optimization for Multi-Valued Containers in OpenCL

Scientific data is mostly multi-valued, e.g., coordinates, velocities, moments or feature components, and it comes in large quantities. The data layout of such containers has an enormous impact on the achieved performance, however, layout optimization is very time-consuming and error-prone because container access syntax in standard programming languages is not sufficiently abstract. This means that […]

OpenCL

May, 26

GPU Passthrough Performance: A Comparison of KVM, Xen, VMWare ESXi, and LXC for CUDA and OpenCL Applications

As more scientific workloads are moved into the cloud, the need for high performance accelerators increases. Accelerators such as GPUs offer improvements in both performance and power efficiency over traditional multi-core processors; however, their use in the cloud has been limited. Today, several common hypervisors support GPU passthrough, but their performance has not been systematically […]

CUDA

•

OpenCL

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Posts

Parallelizing the cellular potts model on GPU and multi-core CPU: An OpenCL cross-platform study

High-Level Programming Framework for Executing Streaming Applications on Heterogeneous OpenCL Platforms

On the Characterization of OpenCL Dwarfs on Fixed and Reconfigurable Platforms

A Portable OpenCL Lattice Boltzmann Code for Multi- and Many-core Processor Architectures

Grover: Looking for Performance Improvement by Disabling Local Memory Usage in OpenCL Kernels

Toward OpenCL Automatic Multi-Device Support

Dynamic loop vectorization for executing OpenCL kernels on CPUs

Efficient all-against-all protein similarity matrix computation using OpenCL

GPU-Accelerated Dynamic Functional Connectivity Analysis for Functional MRI Data Using OpenCL

3D Skeleton Extraction Method using Potential Field on OpenCL

Data Layout Optimization for Multi-Valued Containers in OpenCL

GPU Passthrough Performance: A Comparison of KVM, Xen, VMWare ESXi, and LXC for CUDA and OpenCL Applications

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)