18726

Posts

Jan, 27

Direct N-body code on low-power embedded ARM GPUs

This work arises on the environment of the ExaNeSt project aiming at design and development of an exascale ready supercomputer with low energy consumption profile but able to support the most demanding scientific and technical applications. The ExaNeSt compute unit consists of densely-packed low-power 64-bit ARM processors, embedded within Xilinx FPGA SoCs. SoC boards are […]
Jan, 20

Exploring FPGA-specific Optimizations for Irregular OpenCL Applications

OpenCL is emerging as a high-level hardware description language to address the productivity challenges of developing applications on FPGAs. Unlike traditional hardware description languages (HDLs), OpenCL provides an abstract interface to facilitate high productivity, enabling end users to rapidly describe the required computations, including parallelism and data movement, to create custom hardware accelerators for their […]
Jan, 20

AutoPhase: Compiler Phase-Ordering for High Level Synthesis with Deep Reinforcement Learning

The performance of the code generated by a compiler depends on the order in which the optimization passes are applied. In the context of high-level synthesis, the quality of the generated circuit relates directly to the code generated by the front-end compiler. Unfortunately, choosing a good order–often referred to as the phase-ordering problem–is an NP-hard […]
Jan, 20

Automatic acceleration of Numpy applications on GPUs and multicore CPUs

Frameworks like Numpy are a popular choice for application developers from varied fields such as image processing to bio-informatics to machine learning. Numpy is often used to develop prototypes or for deployment since it provides efficient implementation for operations involving arrays. Such an approach requires every operation to be executed eagerly. The result of each […]
Jan, 20

A Survey of FPGA Based Deep Learning Accelerators: Challenges and Opportunities

With the rapid development of in-depth learning, neural network and deep learning algorithms have been widely used in various fields, e.g., image, video and voice processing. However, the neural network model is getting larger and larger, which is expressed in the calculation of model parameters. Although a wealth of existing efforts on GPU platforms currently […]
Jan, 20

Tango: A Deep Neural Network Benchmark Suite for Various Accelerators

Deep neural networks (DNNs) have been proving the effectiveness in various computing fields. To provide more efficient computing platforms for DNN applications, it is essential to have evaluation environments that include assorted benchmark workloads. Though a few DNN benchmark suites have been recently released, most of them require to install proprietary DNN libraries or resource-intensive […]
Jan, 13

Vulkan 1.1.97 – A Specification (with all registered Vulkan extensions)

This document, referred to as the "Vulkan Specification" describes the Vulkan Application Programming Interface (API). Vulkan is a C99 API designed for explicit control of low-level graphics and compute functionality. The Vulkan specification is intended for use by both implementors of the API and application developers seeking to make use of the API, forming a […]
Jan, 13

Auto-tuned OpenCL kernel co-execution in OmpSs for heterogeneous systems

The emergence of heterogeneous systems has been very notable recently. The nodes of the most powerful computers integrate several compute accelerators, like GPUs. Profiting from such node configurations is not a trivial endeavour. OmpSs is a framework for task based parallel applications, that allows the execution of OpenCl kernels on different compute devices. However, it […]
Jan, 13

Exact Selectivity Computation for Modern In-Memory Database Query Optimization

Selectivity estimation remains a critical task in query optimization even after decades of research and industrial development. Optimizers rely on accurate selectivities when generating execution plans. They maintain a large range of statistical synopses for efficiently estimating selectivities. Nonetheless, small errors — propagated exponentially — can lead to severely sub-optimal plans—especially, for complex predicates. Database […]
Jan, 13

BitCracker: BitLocker meets GPUs

BitLocker is a full-disk encryption feature available in recent Windows versions. It is designed to protect data by providing encryption for entire volumes and it makes use of a number of different authentication methods. In this paper we present a solution, named BitCracker, to attempt the decryption, by means of a dictionary attack, of memory […]
Jan, 13

HG-Caffe: Mobile and Embedded Neural Network GPU (OpenCL) Inference Engine with FP16 Supporting

Breakthroughs in the fields of deep learning and mobile system-on-chips are radically changing the way we use our smartphones. However, deep neural networks inference is still a challenging task for edge AI devices due to the computational overhead on mobile CPUs and a severe drain on the batteries. In this paper, we present a deep […]
Jan, 6

HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Reconfigurable Computing

With the pursuit of improving compute performance under strict power constraints, there is an increasing need for deploying applications to heterogeneous hardware architectures with accelerators, such as GPUs and FPGAs. However, although these heterogeneous computing platforms are becoming widely available, they are very difficult to program especially with FPGAs. As a result, the use of […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: