high performance computing on graphics processing units: hgpu.org

Posts

Dec, 19

Investigation of the SYCL for OpenCL Programming Model

OpenCL and SYCL for OpenCL are open-standard programming models which enable development of parallel programs which target heterogeneous hardware: systems which contain both general-purpose CPUs and accelerator devices such as GPGPUs or Intel Xeon Phi cards. While OpenCL provides a C API, SYCL provides a C++ API and allows programmers to take advantage of many […]

OpenCL

Dec, 12

A Scalable Lane Detection Algorithm on COTSs with OpenCL

Road lane detection are classical requirements for advanced driving assistant systems. With new computer technologies, lane detection algorithms can be exploited on COTS platforms. This paper investigates the use of OpenCL and develop a particle-filter based lane detection algorithm that can tune the trade-off between detection accuracy and speed. Our algorithm is tested on 14 […]

OpenCL

Dec, 8

A Semi-Automated Tool Flow for Roofline Anaylsis of OpenCL Kernels on Accelerators

We propose a tool-flow methodology that can be applied to analyze and track the performance of OpenCL applications on heterogeneous platforms. Using a case study on a datacenter representative workload, we evaluate our tool flow on three distinct heterogeneous platforms and demonstrate how it can be employed more widely to provide insight and track attainable […]

OpenCL

Dec, 4

An Accelerator based on the rho-VEX Processor: an Exploration using OpenCL

In recent years the use of co-processors to accelerate specific tasks is becoming more common. To simplify the use of these accelerators in software, the OpenCL framework has been developed. This framework provides programs a cross-platform interface for using accelerators. The rho-VEX processor is a run-time reconfigurable VLIW processor. It allows run-time switching of configurations, […]

OpenCL

Dec, 1

Bridging OpenCL and CUDA: A Comparative Analysis and Translation

Heterogeneous systems are widening their user-base, and heterogeneous computing is becoming popular in supercomputing. Among others, OpenCL and CUDA are the most popular programming models for heterogeneous systems. Although OpenCL inherited many features from CUDA and they have almost the same platform model, they are not compatible with each other. In this paper, we present […]

CUDA

•

OpenCL

Nov, 25

Optimization of a Machine Learning Algorithm on the Heterogeneous system using OpenCL

Today, there is no one who disagrees on how important data is in every industry especially in enterprise market. More recently, the key point that decides the survival of a business is the management of their big data, which is defined by the 3V’s: Volume, Velocity, and Variety [1]. While the rate of data generation […]

OpenCL

Nov, 11

Autotuning OpenCL Workgroup Size for Stencil Patterns

Selecting an appropriate workgroup size is critical for the performance of OpenCL kernels, and requires knowledge of the underlying hardware, the data being operated on, and the implementation of the kernel. This makes portable performance of OpenCL programs a challenging goal, since simple heuristics and statically chosen values fail to exploit the available performance. To […]

OpenCL

Nov, 3

A Framework for Transparent Execution of Massively-Parallel Applications on CUDA and OpenCL

We present a novel framework for the simultaneous development for different massively parallel platforms. Currently, our framework supports CUDA and OpenCL but it can be easily adapted to other programming languages. The main idea is to provide an easy-to-use abstraction layer that encapsulates the calls of own parallel device code as well as library functions. […]

CUDA

•

OpenCL

Oct, 27

CFP: Fourth International Workshop on OpenCL (IWOCL 2016)

* Call for Papers * Now in its fourth year, the International Workshop on OpenCL (IWOCL) will be hosted by TU Wien in Vienna, Austria, at the C3 Convention Center on April 19th – 21st 2016. April 19th is reserved for an Advanced Hands On OpenCL tutorial with April 20th – 21st consisting of a […]

Oct, 25

Execution of Compound Multi-Kernel OpenCL Computations in Multi-CPU/Multi-GPU Environments

Current computational systems are heterogeneous by nature, featuring a combination of CPUs and GPUs. As the latter are becoming an established platform for high-performance computing, the focus is shifting towards the seamless programming of these hybrid systems as a whole. The distinct nature of the architectural and execution models in place raises several challenges, as […]

OpenCL

Oct, 18

Implementation of a Power Efficient Synthetic Aperture Radar Back Projection Algorithm on FPGAs Using OpenCL

In this thesis, an implementation of a Synthetic Aperture Radar (SAR) back projection algorithm onto a Field-Programmable Gate Array (FPGA) device using Open Computing Language (OpenCL) is developed. SAR back projection is a method to form a high-resolution terrain image from radar data. SAR is used in many applications such as Geographic Information Systems (GIS), […]

OpenCL

Oct, 16

Comparison of Thread Execution Methods for GPU-oriented OpenCL Programs on Multicore Processors

With the broad deployment of multicore processors, there are increasing demands to port OpenCL programs written for GPUs onto the multicore processors. However, OpenCL programs written for GPUs cannot run efficiently on multicore processors since GPU-oriented OpenCL programs generally consist of a huge number of threads. This paper presents experimental comparisons of three thread execution […]

OpenCL

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Investigation of the SYCL for OpenCL Programming Model

A Scalable Lane Detection Algorithm on COTSs with OpenCL

A Semi-Automated Tool Flow for Roofline Anaylsis of OpenCL Kernels on Accelerators

An Accelerator based on the rho-VEX Processor: an Exploration using OpenCL

Bridging OpenCL and CUDA: A Comparative Analysis and Translation

Optimization of a Machine Learning Algorithm on the Heterogeneous system using OpenCL

Autotuning OpenCL Workgroup Size for Stencil Patterns

A Framework for Transparent Execution of Massively-Parallel Applications on CUDA and OpenCL

CFP: Fourth International Workshop on OpenCL (IWOCL 2016)

Execution of Compound Multi-Kernel OpenCL Computations in Multi-CPU/Multi-GPU Environments

Implementation of a Power Efficient Synthetic Aperture Radar Back Projection Algorithm on FPGAs Using OpenCL

Comparison of Thread Execution Methods for GPU-oriented OpenCL Programs on Multicore Processors

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)