high performance computing on graphics processing units: hgpu.org

Posts

Jan, 9

International Workshop on OpenCL

The International Workshop on OpenCL (IWOCL – “eye-wok-ul”) is an annual meeting and community of users, researchers, developers and suppliers that share best practice, and promote the evolution and advancement of the OpenCL standard for parallel programming of heterogeneous systems.

Jan, 8

CHO: A Benchmark Suite for OpenCL-based FPGA Accelerators

Programming FPGAs with OpenCL-based high-level synthesis frameworks is gaining attention with a number of commercial and research frameworks announced. However, there are no benchmarks for evaluating these frameworks. To this end, we present CHO benchmark suite an extension of CHStone, a commonly used C-based high-level synthesis benchmark suite, for OpenCl. We characterise CHO at various […]

OpenCL

Jan, 2

Customization of OpenCL Applications for Efficient Task Mapping under Heterogeneous Platform Constraints

When targeting an OpenCL application to platforms with multiple heterogeneous accelerators, task tuning and mapping have to cope with device-specific constraints. To address this problem, we present an innovative design flow for the customization and performance optimization of OpenCL applications on heterogeneous parallel platforms. It consists of two phases: 1) a tuning phase that optimizes […]

OpenCL

Jan, 2

Performance comparison of Lattice Boltzmann fluid flow simulation using OpenCL and CUDA frameworks

This paper presents performance comparison, of the lid-driven cavity flow simulation, with Lattice Boltzmann method, example, between CUDA and OpenCL parallel programming frameworks. CUDA is parallel programming model developed by NVIDIA for leveraging computing capabilities of their products. OpenCL is an open, royalty free, standard developed by Khronos group for parallel programming of heterogeneous devices […]

CUDA

•

OpenCL

Dec, 30

Characterization of OpenCL on a Scalable FPGA Architecture

The recent release of Altera’s SDK for OpenCL has greatly eased the development of FPGA-based systems. Research have shown performance improvements brought by OpenCL using a single FPGA device. However, to meet the objectives of high performance computing, OpenCL needs to be evaluated using multiple FPGAs. This work has proposed a scalable FPGA architecture for […]

OpenCL

Dec, 30

Extending OmpSs to support CUDA and OpenCL in C, C++ and Fortran Applications

CUDA and OpenCL are the most widely used programming models to exploit hardware accelerators. Both programming models provide a C-based programming language to write accelerator kernels and a host API used to glue the host and kernel parts. Although this model is a clear improvement over a low-level and ad-hoc programming model for each hardware […]

CUDA

•

OpenCL

Dec, 20

A Parallel Recursive Approach for Solving All Pairs Shortest Path Problem on GPU using OpenCL

All-pairs shortest path problem(APSP) finds a large number of practical applications in real world. We owe to present a highly parallel and recursive solution for solving APSP problem based on Kleene’s algorithm. The proposed parallel approach for APSP is implemented using an open standard framework OpenCL which provides a development environment for utilizing massive parallel […]

OpenCL

Dec, 16

An Optimized GPU Memory Hierarchy Design for an OpenCL Kernel

With the advent of multi and many-core processors, communication has replaced computation as the performance bottleneck. Most current approaches to the problem try to tolerate memory access latency through a high amount of Thread-Level Parallelism. However, not all applications benefit from such techniques and there is a need to address the weakness of the underlying […]

OpenCL

Dec, 9

Portable OpenCL Out-of-Order Execution Framework for Heterogeneous Platforms

Heterogeneous computing has become a viable option in seeking computing performance, to the side of conventional homogeneous multi-/single-processor approaches. The advantage of heterogeneity is the possibility to choose the best device on the platform for different distinct workloads in the application to gain performance and/or to lower power consumption. The drawback of heterogeneity is the […]

OpenCL

Dec, 5

IPMACC: Open Source OpenACC to CUDA/OpenCL Translator

In this paper we introduce IPMACC, a framework for translating OpenACC applications to CUDA or OpenCL. IPMACC is composed of set of translators translating OpenACC for C applications to CUDA or OpenCL. The framework uses the system compiler (e.g. nvcc) for generating final accelerator’s binary. The framework can be used for extending the OpenACC API, […]

CUDA

•

OpenCL

Dec, 3

OpenCL Based High-Quality HEVC Motion Estimation on GPU

This paper presents a high quality H.265/HEVC motion estimation implementation with the cooperation of CPU and GPU. The data dependency from MVP (Motion Vector Predictor) restricts the degree of parallelism on GPU. To overcome the constraint from MVP, we propose to use an estimated MVP on GPU and the accurate MVP to refine the motion […]

OpenCL

Nov, 29

A Framework for Composing High-Performance OpenCL from Python Descriptions

Parallel processors have become ubiquitous; most programmers today have access to parallel hardware such as multi-core processors and graphics processors. This has created an implementation gap, where efficiency programmers with knowledge of hardware details can attain high performance by exploiting parallel hardware, while productivity programmers with application-level knowledge may not understand low-level performance trade-offs. Ideally, […]

OpenCL

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Posts

International Workshop on OpenCL

CHO: A Benchmark Suite for OpenCL-based FPGA Accelerators

Customization of OpenCL Applications for Efficient Task Mapping under Heterogeneous Platform Constraints

Performance comparison of Lattice Boltzmann fluid flow simulation using OpenCL and CUDA frameworks

Characterization of OpenCL on a Scalable FPGA Architecture

Extending OmpSs to support CUDA and OpenCL in C, C++ and Fortran Applications

A Parallel Recursive Approach for Solving All Pairs Shortest Path Problem on GPU using OpenCL

An Optimized GPU Memory Hierarchy Design for an OpenCL Kernel

Portable OpenCL Out-of-Order Execution Framework for Heterogeneous Platforms

IPMACC: Open Source OpenACC to CUDA/OpenCL Translator

OpenCL Based High-Quality HEVC Motion Estimation on GPU

A Framework for Composing High-Performance OpenCL from Python Descriptions

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)