26852

Posts

Jun, 5

Dropbear: Machine Learning Marketplaces made Trustworthy with Byzantine Model Agreement

Marketplaces for machine learning (ML) models are emerging as a way for organizations to monetize models. They allow model owners to retain control over hosted models by using cloud resources to execute ML inference requests for a fee, preserving model confidentiality. Clients that rely on hosted models require trustworthy inference results, even when models are […]
Jun, 5

FELARE: Fair Scheduling of Machine Learning Applications on Heterogeneous Edge Systems

Edge computing enables smart IoT-based systems via concurrent and continuous execution of latency-sensitive machine learning (ML) applications. These edge-based machine learning systems are often battery-powered (i.e., energy-limited). They use heterogeneous resources with diverse computing performance (e.g., CPU, GPU, and/or FPGAs) to fulfill the latency constraints of ML applications. The challenge is to allocate user requests […]
Jun, 5

Walle: An End-to-End, General-Purpose, and Large-Scale Production System for Device-Cloud Collaborative Machine Learning

To break the bottlenecks of mainstream cloud-based machine learning (ML) paradigm, we adopt device-cloud collaborative ML and build the first end-to-end and general-purpose system, called Walle, as the foundation. Walle consists of a deployment platform, distributing ML tasks to billion-scale devices in time; a data pipeline, efficiently preparing task input; and a compute container, providing […]
Jun, 5

End-to-end Optimization of Machine Learning Prediction Queries

Prediction queries are widely used across industries to perform advanced analytics and draw insights from data. They include a data processing part (e.g., for joining, filtering, cleaning, featurizing the datasets) and a machine learning (ML) part invoking one or more trained models to perform predictions. These parts have so far been optimized in isolation, leaving […]
May, 29

Fast GPU bounding boxes on tree-structured scenes

Computation of bounding boxes is a fundamental problem in high performance rendering, as it is an input to visibility culling and binning operations. In a scene description structured as a tree, clip nodes and blend nodes entail intersection and union of bounding boxes, respectively. These are straightforward to compute on the CPU using a sequential […]
May, 29

User’s needs influencing HPC technologies

The user requirements imposed by modern challenges are influencing future High Performance Computing (HPC) technologies and use cases. This report analyses a wide range of user requirements and new technologies and their impact on European and worldwide HPC trends, in particular in the PRACE and EuroHPC ecosystems, as well as HPC infrastructures provided by member […]
May, 29

SOL: Reducing the Maintenance Overhead for Integrating Hardware Support into AI Frameworks

The increased interest in Artificial Intelligence (AI) raised the need for highly optimized and sophisticated AI frameworks. Starting with the Lua-based Torch many frameworks have emerged over time, such as Theano, Caffe, Chainer, CNTK, MxNet, PyTorch, DL4J, or TensorFlow. All of these provide a high level scripting API that allows users to easily design neural […]
May, 29

Fault Injection techniques for GPU Reliability Evaluation

A Graphical Processing Unit (GPU) is a computer chip that renders graphics and images by performing rapid mathematical calculations. In recent years, GPUs are exploited for reasons beyond graphics processing as General Purpose GPUs (GPGPUs); they work as hardware accelerators for high-performance computing in many different fields, including safety-critical applications. In these domains, Convolutional Neural […]
May, 29

Lossless Acceleration for Seq2seq Generation with Aggressive Decoding

We study lossless acceleration for seq2seq generation with a novel decoding algorithm — Aggressive Decoding. Unlike the previous efforts (e.g., non-autoregressive decoding) speeding up seq2seq generation at the cost of quality loss, our approach aims to yield the identical (or better) generation compared with autoregressive decoding but in a significant speedup, achieved by innovative cooperation […]
May, 22

Blockchain Goes Green? Part II: Characterizing the Performance and Cost of Blockchains on the Cloud and at the Edge

While state-of-the-art permissioned blockchains can achieve thousands of transactions per second on commodity hardware with x86/64 architecture, their performance when running on different architectures is not clear. The goal of this work is to characterize the performance and cost of permissioned blockchains on different hardware systems, which is important as diverse application domains are adopting […]
May, 22

GPU Ray Tracing with Monte Carlo Methods

Monte Carlo methods are various techniques aimed at obtaining numerical results through simulations with random samples: the base idea of Monte Carlo methods is to generate a sequence of random numbers and execute the same algorithm on each one of them or in groups, then the resulting outputs are combined to obtain the final result. […]
May, 22

The Application of AI Technology in GPU Scheduling Algorithm Optimization

With the rapid development of integrated circuit technology, GPU computing capabilities continue to improve. Due to the continuous improvement and improvement of GPU programming capabilities, functions, and performance, GPUs have been widely used in the field of high-tech general-purpose computers. This article is aimed at studying the optimization of GPU scheduling algorithm based on AI […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: