18803

Posts

Mar, 24

Accelerating ternary quantized convolutional neural networks using OpenCL for FPGA

FPGAs balance the reprogammability of CPUs and the performance of ASICs. They seem the perfect solution to increase the throughput of neural networks. However they must also prove to be highly competitive in a market dominated by GPUs. To achieve this, we focus on the strength of FPGAs that cannot be taken advantage of on […]
Feb, 24

An Empirically Guided Optimization Framework for FPGA OpenCL

FPGAs have been demonstrated to be capable of very high performance, especially power-performance, but generally at the cost of hand-tuned HDL code by FPGA experts. OpenCL is the leading industry effort in improving performance-programmability. But while it is recognized that optimizing OpenCL code using published best practices is critical to achieving good performance, even optimized […]
Feb, 10

AXC: A new format to perform the SpMV oriented to Intel Xeon Phi architecture in OpenCL

Emerging new architectures used in High Performance Computing require new research to adapt and optimise algorithms to them.As part of this effort, we propose the newAXC format to improve the performance of the SpMV product for the Intel Xeon Phi coprocessor. The performance of the OpenCL kernel, based on our new format, is compared with […]
Jan, 20

Exploring FPGA-specific Optimizations for Irregular OpenCL Applications

OpenCL is emerging as a high-level hardware description language to address the productivity challenges of developing applications on FPGAs. Unlike traditional hardware description languages (HDLs), OpenCL provides an abstract interface to facilitate high productivity, enabling end users to rapidly describe the required computations, including parallelism and data movement, to create custom hardware accelerators for their […]
Jan, 13

Auto-tuned OpenCL kernel co-execution in OmpSs for heterogeneous systems

The emergence of heterogeneous systems has been very notable recently. The nodes of the most powerful computers integrate several compute accelerators, like GPUs. Profiting from such node configurations is not a trivial endeavour. OmpSs is a framework for task based parallel applications, that allows the execution of OpenCl kernels on different compute devices. However, it […]
Jan, 13

HG-Caffe: Mobile and Embedded Neural Network GPU (OpenCL) Inference Engine with FP16 Supporting

Breakthroughs in the fields of deep learning and mobile system-on-chips are radically changing the way we use our smartphones. However, deep neural networks inference is still a challenging task for edge AI devices due to the computational overhead on mobile CPUs and a severe drain on the batteries. In this paper, we present a deep […]
Jan, 6

Towards Automatic Transformation of Legacy Scientific Code into OpenCL for Optimal Performance on FPGAs

There is a large body of legacy scientific code written in languages like Fortran that is not optimised to get the best performance out of heterogeneous acceleration devices like GPUs and FPGAs, and manually porting such code into parallel languages frameworks like OpenCL requires considerable effort. We are working towards developing a turn-key, self-optimising compiler […]
Dec, 29

7th International Workshop on OpenCL, 2019

IWOCL is the annual gathering of international community of OpenCL, SYCL and SPIR developers, researchers, suppliers and members of the Khronos Working Groups to share best practise, and to promote the evolution and advancement of the standard. The meeting is open to anyone who is interested in contributing to and participating in the community and […]
Dec, 16

Performance Analysis of a Stereo Matching Implementation in OpenCL

Stereo matching is one of the first steps in the process of calculating 3D information from two 2D images. To triangulate a 3D point from two corresponding 2D features, the displacement in pixels, or the so-called disparity, must be estimated. From the estimated per-pixel disparity, using a projective camera model, 3D data for large portions […]
Dec, 16

Developing acquisition systems based on FPGA with OpenCL

Nuclear fusion is a phenomenon in which the nucleuses of hydrogen crash between them, causing helium atoms. The resulting nucleus is heavier than the hydrogen nucleuses, but is lighter than the addition of the masses of the nucleuses involved in the process. This phenomenon releases huge amounts of energy. The research group i2a2 develops the […]
Dec, 9

Optimization of a discontinuous finite element solver with OpenCL and StarPU

schnaps is a finite element solver designed to simulate various physical phenomena. It is designed to run on hybrid computers made of several CPUs and GPUs. In order to address the hybrid architectures we rely on the StarPU runtime. StarPU allows to optimize in an incremental way a sequential algorithm in order to migrate to […]
Nov, 25

SWIFOLD: Smith-Waterman implementation on FPGA with OpenCL for long DNA sequences

BACKGROUND: The Smith-Waterman (SW) algorithm is the best choice for searching similar regions between two DNA or protein sequences. However, it may become impracticable in some contexts due to its high computational demands. Consequently, the computer science community has focused on the use of modern parallel architectures such as Graphics Processing Units (GPUs), Xeon Phi […]

* * *

* * *

HGPU group © 2010-2020 hgpu.org

All rights belong to the respective authors

Contact us: