19942

Posts

Mar, 8

ADWPNAS: Architecture-Driven Weight Prediction for Neural Architecture Search

How to discover and evaluate the true strength of models quickly and accurately is one of the key challenges in Neural Architecture Search (NAS). To cope with this problem, we propose an Architecture-Driven Weight Prediction (ADWP) approach for neural architecture search (NAS). In our approach, we first design an architecture-intensive search space and then train […]
Mar, 8

Fast Gunrock Subgraph Matching (GSM) on GPUs

In this paper, we propose a novel method, GSM (Gunrock Subgraph Matching), to compute graph matching (subgraph isomorphism) on GPUs. In contrast to previous approaches, GSM is BFS-based: possible matches are explored simultaneously in a breadth-first strategy and thus can be mapped onto GPUs in a massively parallel fashion. Our implementation on the Gunrock graph […]
Mar, 8

Inline Vector Compression for Computational Physics

A novel inline data compression method is presented for single-precision vectors in three dimensions. The primary application of the method is for accelerating computational physics calculations where the throughput is bound by memory bandwidth. The scheme employs spherical polar coordinates, angle quantisation, and a bespoke floating-point representation of the magnitude to achieve a fixed compression […]
Mar, 1

AvA: Accelerated Virtualization of Accelerators

Applications are migrating en masse to the cloud, while accelerators such as GPUs, TPUs, and FPGAs proliferate in the wake of Moore’s Law. These trends are in conflict: cloud applications run on virtual platforms, but existing virtualization techniques have not provided production-ready solutions for accelerators. As a result, cloud providers expose accelerators by dedicating physical […]
Mar, 1

Accelerating CNN on FPGA: An Implementation of MobileNet on FPGA

Convolutional Neural Network is a deep learning algorithm that brings revolutionary impact on computer vision area. One of its applications is image classification. However, problem exists in this algorithm that it involves huge number of operations and parameters, which limits its possibility in time and resource restricted embedded applications. MobileNet, a neural network that uses […]
Mar, 1

Telekine: Secure Computing with Cloud GPUs

GPUs have become ubiquitous in the cloud due to the dramatic performance gains they enable in domains such as machine learning and computer vision. However, offloading GPU computation to the cloud requires placing enormous trust in providers and administrators. Recent proposals for GPU trusted execution environments (TEEs) are promising but fail to address very real […]
Mar, 1

Evaluating the Energy Efficiency of OpenCL-accelerated AutoDock Molecular Docking

AUTODOCK is a molecular docking application that consists of a genetic algorithm coupled with the Solis-Wets localsearch method. Despite its wide usage, its power consumption on heterogeneous systems has not been evaluated extensively. In this work, we evaluate the energy efficiency of an OpenCL-accelerated version of AUTODOCK that, along with the traditional SolisWets method, newly […]
Mar, 1

A Systematic Survey of General Sparse Matrix-Matrix Multiplication

SpGEMM (General Sparse Matrix-Matrix Multiplication) has attracted much attention from researchers in fields of multigrid methods and graph analysis. Many optimization techniques have been developed for certain application fields and computing architecture over the decades. The objective of this paper is to provide a structured and comprehensive overview of the research on SpGEMM. Existing optimization […]
Feb, 23

Performance Counters based Power Modeling of Mobile GPUs using Deep Learning

GPUs have recently become important computational units on mobile devices, resulting in heterogeneous devices that can run a variety of parallel processing applications. While developing and optimizing such applications, estimating power consumption is of immense importance as energy efficiency has become the key design constraint to optimize for on these platforms. In this work, we […]
Feb, 23

Verified Instruction-Level Energy Consumption Measurement for NVIDIA GPUs

Graphics processor units (GPUs) are prevalent in modern computing systems at all scales. They consume a significant fraction of the energy in these systems. However, vendors do not publish the actual cost of the power/energy overhead of their internal microarchitecture. In this paper, we accurately measure the energy consumption of various instructions found in modern […]
Feb, 23

Let’s sort this out: GPGPU Verification of Radix Sort

This paper shows how the VerCors verification toolset can be used to prove data race freedom and functional correctness of a parallel radix sort algorithm for GPUs. This is a widely used standard sorting implementation for GPGPU programming frameworks and therefore its correctness is of utmost importance. Additionally, it presents the usefulness of VerCors as […]
Feb, 23

From English To Foreign Languages: Transferring Pre-trained Language Models

Pre-trained models have demonstrated their effectiveness in many downstream natural language processing (NLP) tasks. The availability of multilingual pre-trained models enables zero-shot transfer of NLP tasks from high resource languages to low resource ones. However, recent research in improving pre-trained models focuses heavily on English. While it is possible to train the latest neural architectures […]

Recent source codes

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org