high performance computing on graphics processing units: hgpu.org

Posts

Sep, 23

Integrated Framework for Heterogeneous Embedded Platforms Using OpenCL

The technology community is rapidly moving away from the age of computers and laptops, and is entering the emerging era of hand-held devices. With the rapid development of smart phones, tablets, and pads, there has been widespread adoption of Graphic Processing Units (GPUs) in the embedded space. The hand-held market is now seeing an ever […]

OpenCL

Sep, 23

Embedding OpenCL in C++ for Expressive GPU Programming

We present a high performance GPU programming language, based on OpenCL, that is embedded in C++. Our embedding provides shared data structures, typesafe kernel invocation, and the ability to more naturally interleave CPU and GPU functions, similar to CUDA but with the portability of OpenCL. For expressivity, our language provides an abstraction that releases control […]

OpenCL

•

OpenGL

Sep, 8

Automatic OpenCL Device Characterization: Guiding Optimized Kernel Design

The OpenCL standard allows targeting a large variety of CPU, GPU and accelerator architectures using a single unified programming interface and language. While the standard guarantees portability of functionality for complying applications and platforms, performance portability on such a diverse set of hardware is limited. Devices may vary significantly in memory architecture as well as […]

OpenCL

Sep, 8

Accelerating Clustering Coefficient Calculations on a GPU Using OPENCL

The growth in multicore CPUs and the emergence of powerful manycore GPUs has led to proliferation of parallel applications. Many applications are not straight forward to be parallelized. This paper examines the performance of a parallelized implementation for calculating measurements of Complex Networks. We present an algorithm for calculating complex networks topological feature clustering coefficient, […]

OpenCL

Aug, 18

ATI Stream Profiler: a tool to optimize an OpenCL kernel on ATI Radeon GPUs

Modern GPUs have been shown to be highly efficient machines for data-parallel applications such as graphics, image, video processing, or physical simulation applications. For example, a single ATI Radeon HD 5870 GPU has a theoretical peak of 2.72 teraflops (1012 floating-point operations per second) with a video memory bandwidth of 153.6 GB/s. While it is […]

OpenCL

Aug, 18

Physical and graphical effects in OpenCL by example

There are strong indications that the future of interactive graphics involves a more flexible programming model than today’s OpenGL/Direct3D pipelines. That means that graphics developers will need a basic understanding of how to combine emerging parallel-programming techniques with the traditional interactive rendering pipeline. This course provides an introduction to parallel-programming architectures and environments for interactive […]

OpenCL

Aug, 18

Parallelization of the x264 encoder using OpenCL

With the introduction of H.264, the complexity on video encoders has increased dramatically. As hardware based encoding solutions profit from the strict sequential design and already feature real time capabilities for high definition material, software solutions lack most of the encoding performance. More precisely, the performance of software encoders is limited due to the computation […]

OpenCL

Aug, 18

Simulating Biological-Inspired Spiking Neural Networks with OpenCL

The algorithms used for simulating biologically-inspired spiking neural networks (BIANN) often utilize functions which are computationally complex and have to model a large number of neurons – or even a much larger number of synapses in parallel. To use all available computing resources provided by a standard desktop PC is an opportunity to shorten the […]

OpenCL

Aug, 18

Parallel Batch Training of the Self-Organizing Map Using OpenCL

The Self-Organizing Maps (SOMs) are popular artificial neural networks that are often used for data analyses through clustering and visualisation. SOM’s mathematical model is inherently parallel. However, many implementations have not successfully exploited its parallelism because previous attempts often required cluster-like infrastructures. This article presents the parallel implementation of SOMs, particularly the batch map variant […]

OpenCL

Aug, 18

Maestro: Data Orchestration and Tuning for OpenCL Devices

As heterogeneous computing platforms become more prevalent, the programmer must account for complex memory hierarchies in addition to the difficulties of parallel programming. OpenCL is an open standard for parallel computing that helps alleviate this difficulty by providing a portable set of abstractions for device memory hierarchies. However, OpenCL requires that the programmer explicitly controls […]

OpenCL

Aug, 18

Optimizing the exploitation of multicore processors and GPUs with OpenMP and OpenCL

In this paper, we present OMPSs, a programming model based on OpenMP and StarSs, that can also incorporate the use of OpenCL or CUDA kernels. We evaluate the proposal on three different architectures, SMP, Cell/B.E. and GPUs, showing the wide usefulness of the approach. The evaluation is done with four different benchmarks, Matrix Multiply, BlackScholes, […]

CUDA

•

OpenCL

Aug, 18

Analyzing program flow within a many-kernel OpenCL application

Many developers have begun to realize that heterogeneous multi-core and many-core computer systems can provide significant performance opportunities to a range of applications. Typical applications possess multiple components that can be parallelized; developers need to be equipped with proper performance tools to analyze program flow and identify application bottlenecks. In this paper, we analyze and […]

OpenCL

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Integrated Framework for Heterogeneous Embedded Platforms Using OpenCL

Embedding OpenCL in C++ for Expressive GPU Programming

Automatic OpenCL Device Characterization: Guiding Optimized Kernel Design

Accelerating Clustering Coefficient Calculations on a GPU Using OPENCL

ATI Stream Profiler: a tool to optimize an OpenCL kernel on ATI Radeon GPUs

Physical and graphical effects in OpenCL by example

Parallelization of the x264 encoder using OpenCL

Simulating Biological-Inspired Spiking Neural Networks with OpenCL

Parallel Batch Training of the Self-Organizing Map Using OpenCL

Maestro: Data Orchestration and Tuning for OpenCL Devices

Optimizing the exploitation of multicore processors and GPUs with OpenMP and OpenCL

Analyzing program flow within a many-kernel OpenCL application

Recent source codes

Code examples for paper on SYCL backend of Kokkos - IWOCL 2024

ROCm's implementation of Gromacs

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Most viewed papers (last 30 days)