high performance computing on graphics processing units: hgpu.org

Posts

Jul, 13

Optimization, Specification and Verification of the Prefix Sum Program in an OpenCL Environment

The Prefix Sum is an algorithm used as a building block for various other algorithms, for example radix sort, quicksort and lexically comparing strings. Implementing the Prefix Sum algorithm on the CPU is trivial, but a parallel approach with OpenCL is more complicated. An implementation in OpenCL has been made, and optimized to minimize branch […]

OpenCL

Jul, 6

LTTng CLUST: A system-wide unified CPU and GPU tracing tool for OpenCL applications

As computation schemes evolve and many new tools become available to programmers to enhance the performance of their applications, many programmers started to look towards highly parallel platforms such as Graphical Processing Unit (GPU). Offloading computations that can take advantage of the architecture of the GPU is a technique that has proven fruitful in recent […]

OpenCL

Jun, 30

CPU and GPU Implementation of QCD by using OpenCL

Recently, many particle physics applications can be parallelized by using multicore platforms such as CPU and GPU. In this paper, we propose a parallel processing approach for Quantum ChromoDynamics(QCD) application by using both CPU and GPU. Instead of distributing the parallelizable workload to either CPU or GPU, we distribute the workload simultaneously into both CPU […]

OpenCL

Jun, 19

Parallel BTF Compression with Multi-Level Vector Quantization in OpenCL

Bidirectional Texture Function (BTF) as an effective visual fidelity representation of surface appearance is becoming more and more widely used. In this paper we report on contributions to BTF data compression for multi-level vector quantization. We describe novel decompositions that improve the compression ratio by 15% in comparison with the original method, without loss of […]

OpenCL

Jun, 19

Visualization of OpenCL Application Execution on CPU-GPU Systems

Evaluating the performance of parallel and heterogeneous programs and architectures can be challenging. An emulator or simulator can be used to aid the programmer. To provide guidance and feedback to the programmer, the simulator needs to present traces, reports, and debugging information in a coherent and unambiguous format. Although these outputs contain a lot of […]

CUDA

•

OpenCL

Jun, 16

Parallelization of DIRA and CTmod using OpenMP and OpenCL

Parallelization is the answer to the ever-growing demands of computing power by taking advantage of multi-core processor technology and modern many-core graphics compute units. Multi-core CPUs and many-core GPUs have the potential to substantially reduce the execution time of a program but it is often a challenging task to ensure that all available hardware is […]

OpenCL

Jun, 8

Improving OpenCL Programmability with the Heterogeneous Programming Library

The use of heterogeneous devices is becoming increasingly widespread. Their main drawback is their low programmability due to the large amount of details that must be handled. Another important problem is the reduced code portability, as most of the tools to program them are vendor or device-specific. The exception to this observation is OpenCL, which […]

OpenCL

Jun, 5

Machine Learning Based Auto-tuning for Enhanced OpenCL Performance Portability

Heterogeneous computing, which combines devices with different architectures, is rising in popularity, and promises increased performance combined with reduced energy consumption. OpenCL has been proposed as a standard for programing such systems, and offers functional portability. It does, however, suffer from poor performance portability, code tuned for one device must be re-tuned to achieve good […]

OpenCL

May, 25

Nucleation of nanoparticles in a coarse grained fluid using OpenCL

In this thesis, the nucleation rate of almost hard spheres in a course-grained fluid is measured to study the effects of an explicit solvent on the nucleation rate. Previous measurements show a discrepancy between physical measurements and simulations, where the latter all used implicit solvents. In this thesis, the fluid is approximated using Stochastic Rotation […]

OpenCL

May, 22

An Introduction to OpenCL C++

Today servers, desktops, mobile devices, and embedded systems contain many processors in addition to the CPU that runs programs. These extra processors are generally called accelerators and could be a GPU, FPGA, Xeon Phi, or other programmable device. There are many types of accelerators available, from many vendors, for many different environments. Khronos developed the […]

OpenCL

May, 19

CHO: Towards a Benchmark Suite for OpenCL FPGA Accelerators

Programming FPGAs with OpenCL-based high-level synthesis frameworks is gaining attention with a number of commercial and research frameworks announced. However, there are no benchmarks for evaluating these frameworks. To this end, we present CHO benchmark suite an extension of CHStone, a commonly used C-based high-level synthesis benchmark suite, for OpenCL. We characterise CHO at various […]

OpenCL

May, 3

IPMACC: Translating OpenACC API to OpenCL

In this paper, we introduce IPMACC a framework for executing OpenACC for C applications over OpenCL runtime. We use over framework to compare performance of OpenACC and OpenCL. OpenACC API abstractions remove the low-level control from programmers’ hand. To understand the low-level OpenCL optimizations that are not applicable in OpenACC, we compare highly-optimized OpenCL and […]

CUDA

•

OpenCL

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Posts

Optimization, Specification and Verification of the Prefix Sum Program in an OpenCL Environment

LTTng CLUST: A system-wide unified CPU and GPU tracing tool for OpenCL applications

CPU and GPU Implementation of QCD by using OpenCL

Parallel BTF Compression with Multi-Level Vector Quantization in OpenCL

Visualization of OpenCL Application Execution on CPU-GPU Systems

Parallelization of DIRA and CTmod using OpenMP and OpenCL

Improving OpenCL Programmability with the Heterogeneous Programming Library

Machine Learning Based Auto-tuning for Enhanced OpenCL Performance Portability

Nucleation of nanoparticles in a coarse grained fluid using OpenCL

An Introduction to OpenCL C++

CHO: Towards a Benchmark Suite for OpenCL FPGA Accelerators

IPMACC: Translating OpenACC API to OpenCL

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)