18700

Posts

Jan, 6

Vector and Line Quantization for Billion-scale Similarity Search on GPUs

Billion-scale high-dimensional approximate nearest neighbour (ANN) search has become an important problem for searching similar objects among the vast amount of images and videos available online. The existing ANN methods are usually characterized by their specific indexing structures, including the inverted index and the inverted multi-index. The inverted index structure is amenable to GPU-based implementations, […]
Jan, 6

HetExchange: Encapsulating heterogeneous CPU-GPU parallelism in JIT compiled engines

Modern server hardware is increasingly heterogeneous as hardware accelerators, such as GPUs, are used together with multicore CPUs to meet the computational demands of modern data analytics workloads. Unfortunately, query parallelization techniques used by analytical database engines are designed for homogeneous multicore servers, where query plans are parallelized across CPUs to process data stored in […]
Jan, 6

ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Method of Multipliers

To facilitate efficient embedded and hardware implementations of deep neural networks (DNNs), two important categories of DNN model compression techniques: weight pruning and weight quantization are investigated. The former leverages the redundancy in the number of weights, whereas the latter leverages the redundancy in bit representation of weights. However, there lacks a systematic framework of […]
Jan, 6

Towards Automatic Transformation of Legacy Scientific Code into OpenCL for Optimal Performance on FPGAs

There is a large body of legacy scientific code written in languages like Fortran that is not optimised to get the best performance out of heterogeneous acceleration devices like GPUs and FPGAs, and manually porting such code into parallel languages frameworks like OpenCL requires considerable effort. We are working towards developing a turn-key, self-optimising compiler […]
Dec, 30

A Survey on Optimized Implementation of Deep Learning Models on the NVIDIA Jetson Platform

Design of hardware accelerators for neural network (NN) applications involves walking a tight rope amidst the constraints of low-power, high accuracy and throughput. NVIDIA’s Jetson is a promising platform for embedded machine learning which seeks to achieve a balance between the above objectives. In this paper, we provide a survey of works that evaluate and […]
Dec, 30

Automatic Performance Optimization on Heterogeneous Computer Systems using Manycore Coprocessors

Emerging computer architectures and advanced computing technologies, such as Intel’s Many Integrated Core (MIC) Architecture and graphics processing units (GPU), provide a promising solution to employ parallelism for achieving high performance, scalability and low power consumption. As a result, accelerators have become a crucial part in developing supercomputers. Accelerators usually equip with different types of […]
Dec, 30

A Study on the Acceleration of Arrival Curve Construction and Regular Specification Mining using GPUs

Data analytics is a process of examining datasets using various analytical and statistical techniques. Several tools have been proposed in the literature to extract hidden patterns, gather insights and build mathematical models from large datasets. However, these tools have been known to be computationally demanding as the datasets become larger over time. Two such recently […]
Dec, 30

Speeding-up the Verification Phase of Set Similarity Joins in the GPGPU paradigm

We investigate the problem of exact set similarity joins using a co-process CPU-GPU scheme. The state-of-the-art CPU solutions split the wok in two main phases. First, filtering and index building takes place to reduce the candidate sets to be compared as much as possible; then the pairs are compared to verify whether they should become […]
Dec, 30

ChamNet: Towards Efficient Network Design through Platform-Aware Model Adaptation

This paper proposes an efficient neural network (NN) architecture design methodology called Chameleon that honors given resource constraints. Instead of developing new building blocks or using computationally-intensive reinforcement learning algorithms, our approach leverages existing efficient network building blocks and focuses on exploiting hardware traits and adapting computation resources to fit target latency and/or energy constraints. […]
Dec, 29

7th International Workshop on OpenCL, 2019

IWOCL is the annual gathering of international community of OpenCL, SYCL and SPIR developers, researchers, suppliers and members of the Khronos Working Groups to share best practise, and to promote the evolution and advancement of the standard. The meeting is open to anyone who is interested in contributing to and participating in the community and […]
Dec, 29

Distributed Heterogeneous Programming in C/C++ (DHPCC++), 2019

This will be the 3rd DHPCC++ event in partnership with IWOCL, the international OpenCL workshop with a focus on heterogeneous programming models for C and C++, covering all the programming models that have been designed to support heterogeneous programming in C and C++. Many C++ programming models exist including SYCL, HPX, KoKKos, Raja, C++AMP, HCC, […]
Dec, 23

wav2letter++: The Fastest Open-source Speech Recognition System

This paper introduces wav2letter++, the fastest open-source deep learning speech recognition framework. wav2letter++ is written entirely in C++, and uses the ArrayFire tensor library for maximum efficiency. Here we explain the architecture and design of the wav2letter++ system and compare it to other major open-source speech recognition systems. In some cases wav2letter++ is more than […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: