24544

Posts

Feb, 14

Transparent FPGA Acceleration with TensorFlow

Today, artificial neural networks are one of the major innovators pushing the progress of machine learning. This has particularly affected the development of neural network accelerating hardware. However, since most of these architectures require specialized toolchains, there is a certain amount of additional effort for developers each time they want to make use of a […]
Jan, 31

C-for-Metal: High Performance SIMD Programming on Intel GPUs

The SIMT execution model is commonly used for general GPU development. CUDA and OpenCL developers write scalar code that is implicitly parallelized by compiler and hardware. On Intel GPUs, however, this abstraction has profound performance implications as the underlying ISA is SIMD and important hardware capabilities cannot be fully utilized. To close this performance gap […]
Dec, 27

CNN2Gate: An Implementation of Convolutional Neural Networks Inference on FPGAs with Automated Design Space Exploration

Convolutional Neural Networks (CNNs) have a major impact on our society, because of the numerous services they provide. These services include, but are not limited to image classification, video analysis, and speech recognition. Recently, the number of researches that utilize FPGAs to implement CNNs are increasing rapidly. This is due to the lower power consumption […]
Dec, 6

Exploring FPGA Optimizations to Compute Sparse Numerical Linear Algebra Kernels

The solution of sparse triangular linear systems (sptrsv) is the bottleneck of many numerical methods. Thus, it is crucial to count with efficient implementations of such kernel, at least for commonly used platforms. In this sense, Field–Programmable Gate Arrays (FPGAs) have evolved greatly in the last years, entering the HPC hardware ecosystem largely due to […]
Nov, 29

Evaluating the Performance and Portability of Contemporary SYCL Implementations

SYCL is a single-source programming model for heterogeneous systems; it promises improved maintainability, productivity, and opportunity for compiler optimization, when compared to accelerator specific programming models. Several implementations of the SYCL standard have been developed over the past few years, including several backends using contemporary accelerator languages, like OpenCL, CUDA, and HIP. These implementations vary […]
Nov, 29

HALO 1.0: A Hardware-agnostic Accelerator Orchestration Framework for Enabling Hardware-agnostic Programming with True Performance Portability for Heterogeneous HPC

Hardware-agnostic programming with high performance portability will be the bedrock for realizing the ubiquitous adoption of emerging accelerator technologies in future heterogeneous high-performance computing (HPC) systems, which is the key to achieving the next level of HPC performance on an expanding accelerator landscape. In this paper, we present HALO 1.0, an open-ended extensible multi-agent software […]
Nov, 22

GPURepair: Automated Repair of GPU Kernels

This paper presents a tool for repairing errors in GPU kernels written in CUDA or OpenCL due to data races and barrier divergence. Our novel extension to prior work can also remove barriers that are deemed unnecessary for correctness. We implement these ideas in our tool called GPURepair, which uses GPUVerify as the verification oracle […]
Nov, 15

Automatic GPU optimization through higher-order functions in functional languages

Over recent years, graphics processing units (GPUs) have become popular devices to use in procedures that exhibit data-parallelism. Due to high parallel capability, running procedures on a GPU can result in an execution time speedup ranging from a couple times faster to several orders of magnitude faster, compared to executing serially on a central processing […]
Nov, 8

Transparent Compiler and Runtime Specializations for Accelerating Managed Languages on FPGAs

In recent years, heterogeneous computing has emerged as the vital way to increase computers’ performance and energy efficiency by combining diverse hardware devices, such as Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs). The rationale behind this trend is that different parts of an application can be offloaded from the main CPU to […]
Nov, 8

AMGCL – A C++ library for efficient solution of large sparse linear systems

AMGCL is a header-only C++ library for the solution of large sparse linear systems with algebraic multigrid. The method may be used as a black-box solver for computational problems in various fields, since it does not require any information about the underlying geometry. AMGCL provides an efficient, flexible, and extensible implementation of several iterative solvers […]
Oct, 4

Transparent Acceleration of Java-based Deep Learning Engines

The advent of modern cloud services, along with the huge volume of data produced on a daily basis, have increased the demand for fast and efficient data processing. This demand is common among numerous application domains, such as deep learning, data mining, and computer vision. In recent years, hardware accelerators have been employed as a […]
Sep, 27

Extending High-Level Synthesis for Task-Parallel Programs

C/C++/OpenCL-based high-level synthesis (HLS) becomes more and more popular for field-programmable gate array (FPGA) accelerators in many application domains in recent years, thanks to its competitive quality of result (QoR) and short development cycle compared with the traditional register-transfer level (RTL) design approach. Yet, limited by the sequential C semantics, it remains challenging to adopt […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: