26994

Posts

Jul, 10

FPGA Implementation of Bluetooth Low Energy Physical Layer with OpenCL

This dissertation is primarily presenting the design of Digital Signal Processing (DSP) between the transmission in Bluetooth Low Energy Physical Layer (BLE PHY), and its implementation in a Field Programmable Gate Array (FPGA) device with Open Computing Language (OpenCL). During the design of DSP, it bases on the In-Phase/Quadrature-Phase (IQ) architecture to construct the modulation […]
Apr, 17

Performance Comparison of Different OpenCL Implementations of LBM Simulation on Commodity Computer Hardware

Parallel programming is increasingly used to improve the performance of solving numerical methods used for scientific purposes. Numerical methods in the field of fluid dynamics require the calculation of a large number of operations per second. One of the methods that is easily parallelized and often used is the Lattice Boltzmann method (LBM). Today, it […]
Feb, 20

A ML-based resource utilization OpenCL GPU-kernel fusion model

Massive data parallelism can be achieved by using general-purpose graphics processing units (GPGPU) with the help of the OpenCL framework. When smaller data with higher GPU memory is executed, it results in a low resource utilization ratio and energy inefficiencies. Up until now, there is no existing model to share GPU for further execution. In […]
Jan, 16

Fancier: A Unified Framework for Java, C, and OpenCL Integration

Graphics Processing Units (GPUs) have evolved from very specialized designs geared towards computer graphics to accommodate general-purpose highly-parallel workloads. Harnessing the performance that these accelerators provide requires the use of specialized native programming interfaces, such as CUDA or OpenCL, or higher-level programming models like OpenMP or OpenACC. However, on managed programming languages, offloading execution into […]
Jan, 16

Studying the Potential of Automatic Optimizations in the Intel FPGA SDK for OpenCL

High Level Synthesis (HLS) tools, like the Intel FPGA SDK for OpenCL, improve design productivity and enable efficient design space exploration guided by simple program directives (pragmas), but may sometimes miss important optimizations necessary for high performance. In this paper, we present a study of the tradeoffs in HLS optimizations, and the potential of a […]
Dec, 26

OpenCL-HPX Integration

Distributed applications combine the computational capabilities of heterogeneous nodes. As such, they offer challenges regarding data transfer and synchronization. HPX is a library for concurrent, parallel applications. It strives not only to address challenges regarding distributed systems, but also to conform to current and upcoming C++ standards. One of the solutions found in heterogeneous systems […]
Dec, 26

FSpGEMM: An OpenCL-based HPC Framework for Accelerating General Sparse Matrix-Matrix Multiplication on FPGAs

General sparse matrix-matrix multiplication (SpGEMM) is an integral part of many scientific computing, high-performance computing (HPC), and graph analytic applications. This paper presents a new compressed sparse vector (CSV) format for representing sparse matrices and FSpGEMM, an OpenCL-based HPC framework for accelerating general sparse matrix-matrix multiplication on FPGAs. The proposed FSpGEMM framework includes an FPGA […]
Dec, 19

Optimization of Compiler-generated OpenCL CNN Kernels and Runtime for FPGAs

This work explores the viability of end-to-end convolutional neural network inference using OpenCL HLS kernels generated from TVM on Intel FPGAs. We explore layer-pipelined execution for small networks and time-multiplexed kernels for larger CNNs. Naively generated kernels do not produce efficient hardware. We propose a set of optimizations to increase parallelism, resource utilization, and more […]
Nov, 28

Concurrency Mapping to FPGAs with OpenCL: A Case Study with a Shallow Water Kernel

FPGAs have been around for over 30 years and are a viable accelerator for compute-intensive workloads on HPC systems. The adoption of FPGAs for scientific applications has been stimulated recently by the emergence of better programming environments such as High-Level Synthesis (HLS) and OpenCL available through the Xilinx SDSoC design tool. The mapping of the […]
Aug, 8

PoCL-R: A Scalable Low Latency Distributed OpenCL Runtime

Offloading the most demanding parts of applications to an edge GPU server cluster to save power or improve the result quality is a solution that becomes increasingly realistic with new networking technologies. In order to make such a computing scheme feasible, an application programming layer that can provide both low latency and scalable utilization of […]
Jul, 25

A method for decompilation of AMD GCN kernels to OpenCL

Introduction: Decompilers are useful tools for software analysis and support in the absence of source code. They are available for many hardware architectures and programming languages. However, none of the existing decompilers support modern AMD GPU architectures such as AMD GCN and RDNA. Purpose: We aim at developing the first assembly decompiler tool for a […]
Jul, 18

OpenCL FPGA Optimization guided by memory accesses and roofline model analysis applied to tomography acceleration

Backward projection is one of the most time-consuming steps in method-based iterative reconstruction computed tomography. The 3D backprojection memory access pattern is potentially enough regular to exploit efficiently the computation power of acceleration boards based on GPU or FPGA. The highlevel tools like HLS or OpenCL ease consider such particular memory accesses during the design […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: