high performance computing on graphics processing units: hgpu.org

Posts

Mar, 3

Hadoop Mapreduce OpenCL Plugin

Modern systems generates huge amounts of information right from areas like finance, telematics, healthcare, IOT devices to name a few, the modern day computing frameworks like Mapreduce needs an ever increasing amount of computing power to sort, arrange and generate insights from the data. This project is an attempt to harness the power of heterogeneous […]

OpenCL

Feb, 23

VirtCL: a framework for OpenCL device abstraction and management

The interest in using multiple graphics processing units (GPUs) to accelerate applications has increased in recent years. However, the existing heterogeneous programming models (e.g., OpenCL) abstract details of GPU devices at the per-device level and require programmers to explicitly schedule their kernel tasks on a system equipped with multiple GPU devices. Unfortunately, multiple applications running […]

OpenCL

Feb, 23

Automatic Command Queue Scheduling for Task-Parallel Workloads in OpenCL

OpenCL is a portable interface that can be used to program cluster nodes with heterogeneous compute devices. The OpenCL specification tightly binds its workflow abstraction, or "command queue", to a specific device for the entire program. For best performance, the user has to find the ideal queue-device mapping at command queue creation time, an effort […]

OpenCL

Feb, 19

Automatic and portable mapping of data parallel programs to OpenCL for GPU-based heterogeneous systems

General purpose GPU based systems are highly attractive as they give potentially massive performance at little cost. Realizing such potential is challenging due to the complexity of programming. This article presents a compiler based approach to automatically generate optimized OpenCL code from data-parallel OpenMP programs for GPUs. A key feature of our scheme is that […]

OpenCL

Feb, 8

Workload distribution and balancing in FPGAs and CPUs with OpenCL and TBB

In this paper we evaluate the performance and energy effectiveness of FPGA and CPU devices for a kind of parallel computing applications in which the workload can be distributed in a way that enables simultaneous computing in addition to simple off loading. The FPGA device is programmed via OpenCL using the recent availability of commercial […]

OpenCL

Feb, 4

A Performance Analysis Framework for Optimizing OpenCL Applications on FPGAs

Recently, FPGA vendors such as Altera and Xilinx have released OpenCL SDK for programming FPGAs. However, the architecture of FPGA is significantly different from that of CPU/GPU, for which OpenCL is originally designed. Tuning the OpenCL code for good performance on FPGAs is still an open problem, since the existing OpenCL tools and models designed […]

OpenCL

Jan, 29

GPU-Accelerated Recurrent Neural Networks: OpenCLLink and SymbolicC

The paper presents application of OpenCLLink in Wolfram Mathematica to accelerate fully recurrent neural networks using GPU. We also show the idea of automatically generated parts of source code using SymbolicC.

OpenCL

Jan, 14

A Case for Work-stealing on FPGAs with OpenCL Atomics

We provide a case study of work-stealing, a popular method for run-time load balancing, on FPGAs. Following the Cederman-Tsigas implementation for GPUs, we synchronize workitems not with locks, mutexes or critical sections, but instead with the atomic operations provided by Altera’s OpenCL SDK. We evaluate work-stealing for FPGAs by synthesizing a K-means clustering algorithm on […]

OpenCL

Dec, 31

Study of basic vector operations on Intel Xeon Phi and NVIDIA Tesla using OpenCL

The present work is an analysis of the performance of the basic vector operations AXPY, DOT and SpMV using OpenCL. The code was tested on the NVIDIA Tesla S2050 GPU and Intel Xeon Phi 3120A coprocessor. Due to the nature of the AXPY function, only two versions were implemented, the routine to be executed by […]

OpenCL

Dec, 23

SqueezCL: Squeezing OpenCL Kernels for Approximate Computing on Contemporary GPUs

Approximate computing provides an opportunity for exploiting application characteristics to improve performance of computing systems. However, such opportunity must be balanced against generality of methods and quality guarantees that the system designer can provide to the application developer. Improved parallel processing in graphics processing units (GPUs) provides one such means for data-level parallel applications. We […]

OpenCL

Dec, 19

Investigation of the SYCL for OpenCL Programming Model

OpenCL and SYCL for OpenCL are open-standard programming models which enable development of parallel programs which target heterogeneous hardware: systems which contain both general-purpose CPUs and accelerator devices such as GPGPUs or Intel Xeon Phi cards. While OpenCL provides a C API, SYCL provides a C++ API and allows programmers to take advantage of many […]

OpenCL

Dec, 12

A Scalable Lane Detection Algorithm on COTSs with OpenCL

Road lane detection are classical requirements for advanced driving assistant systems. With new computer technologies, lane detection algorithms can be exploited on COTS platforms. This paper investigates the use of OpenCL and develop a particle-filter based lane detection algorithm that can tune the trade-off between detection accuracy and speed. Our algorithm is tested on 14 […]

OpenCL

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Hadoop Mapreduce OpenCL Plugin

VirtCL: a framework for OpenCL device abstraction and management

Automatic Command Queue Scheduling for Task-Parallel Workloads in OpenCL

Automatic and portable mapping of data parallel programs to OpenCL for GPU-based heterogeneous systems

Workload distribution and balancing in FPGAs and CPUs with OpenCL and TBB

A Performance Analysis Framework for Optimizing OpenCL Applications on FPGAs

GPU-Accelerated Recurrent Neural Networks: OpenCLLink and SymbolicC

A Case for Work-stealing on FPGAs with OpenCL Atomics

Study of basic vector operations on Intel Xeon Phi and NVIDIA Tesla using OpenCL

SqueezCL: Squeezing OpenCL Kernels for Approximate Computing on Contemporary GPUs

Investigation of the SYCL for OpenCL Programming Model

A Scalable Lane Detection Algorithm on COTSs with OpenCL

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)