Jan, 20

Tango: A Deep Neural Network Benchmark Suite for Various Accelerators

Deep neural networks (DNNs) have been proving the effectiveness in various computing fields. To provide more efficient computing platforms for DNN applications, it is essential to have evaluation environments that include assorted benchmark workloads. Though a few DNN benchmark suites have been recently released, most of them require to install proprietary DNN libraries or resource-intensive […]
Jan, 13

Vulkan 1.1.97 – A Specification (with all registered Vulkan extensions)

This document, referred to as the "Vulkan Specification" describes the Vulkan Application Programming Interface (API). Vulkan is a C99 API designed for explicit control of low-level graphics and compute functionality. The Vulkan specification is intended for use by both implementors of the API and application developers seeking to make use of the API, forming a […]
Jan, 13

Auto-tuned OpenCL kernel co-execution in OmpSs for heterogeneous systems

The emergence of heterogeneous systems has been very notable recently. The nodes of the most powerful computers integrate several compute accelerators, like GPUs. Profiting from such node configurations is not a trivial endeavour. OmpSs is a framework for task based parallel applications, that allows the execution of OpenCl kernels on different compute devices. However, it […]
Jan, 13

Exact Selectivity Computation for Modern In-Memory Database Query Optimization

Selectivity estimation remains a critical task in query optimization even after decades of research and industrial development. Optimizers rely on accurate selectivities when generating execution plans. They maintain a large range of statistical synopses for efficiently estimating selectivities. Nonetheless, small errors — propagated exponentially — can lead to severely sub-optimal plans—especially, for complex predicates. Database […]
Jan, 13

BitCracker: BitLocker meets GPUs

BitLocker is a full-disk encryption feature available in recent Windows versions. It is designed to protect data by providing encryption for entire volumes and it makes use of a number of different authentication methods. In this paper we present a solution, named BitCracker, to attempt the decryption, by means of a dictionary attack, of memory […]
Jan, 13

HG-Caffe: Mobile and Embedded Neural Network GPU (OpenCL) Inference Engine with FP16 Supporting

Breakthroughs in the fields of deep learning and mobile system-on-chips are radically changing the way we use our smartphones. However, deep neural networks inference is still a challenging task for edge AI devices due to the computational overhead on mobile CPUs and a severe drain on the batteries. In this paper, we present a deep […]
Jan, 6

HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Reconfigurable Computing

With the pursuit of improving compute performance under strict power constraints, there is an increasing need for deploying applications to heterogeneous hardware architectures with accelerators, such as GPUs and FPGAs. However, although these heterogeneous computing platforms are becoming widely available, they are very difficult to program especially with FPGAs. As a result, the use of […]
Jan, 6

Vector and Line Quantization for Billion-scale Similarity Search on GPUs

Billion-scale high-dimensional approximate nearest neighbour (ANN) search has become an important problem for searching similar objects among the vast amount of images and videos available online. The existing ANN methods are usually characterized by their specific indexing structures, including the inverted index and the inverted multi-index. The inverted index structure is amenable to GPU-based implementations, […]
Jan, 6

HetExchange: Encapsulating heterogeneous CPU-GPU parallelism in JIT compiled engines

Modern server hardware is increasingly heterogeneous as hardware accelerators, such as GPUs, are used together with multicore CPUs to meet the computational demands of modern data analytics workloads. Unfortunately, query parallelization techniques used by analytical database engines are designed for homogeneous multicore servers, where query plans are parallelized across CPUs to process data stored in […]
Jan, 6

ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Method of Multipliers

To facilitate efficient embedded and hardware implementations of deep neural networks (DNNs), two important categories of DNN model compression techniques: weight pruning and weight quantization are investigated. The former leverages the redundancy in the number of weights, whereas the latter leverages the redundancy in bit representation of weights. However, there lacks a systematic framework of […]
Jan, 6

Towards Automatic Transformation of Legacy Scientific Code into OpenCL for Optimal Performance on FPGAs

There is a large body of legacy scientific code written in languages like Fortran that is not optimised to get the best performance out of heterogeneous acceleration devices like GPUs and FPGAs, and manually porting such code into parallel languages frameworks like OpenCL requires considerable effort. We are working towards developing a turn-key, self-optimising compiler […]
Dec, 30

A Survey on Optimized Implementation of Deep Learning Models on the NVIDIA Jetson Platform

Design of hardware accelerators for neural network (NN) applications involves walking a tight rope amidst the constraints of low-power, high accuracy and throughput. NVIDIA’s Jetson is a promising platform for embedded machine learning which seeks to achieve a balance between the above objectives. In this paper, we provide a survey of works that evaluate and […]

* * *

* * *

HGPU group © 2010-2019 hgpu.org

All rights belong to the respective authors

Contact us: