Posts
May, 29
SOL: Reducing the Maintenance Overhead for Integrating Hardware Support into AI Frameworks
The increased interest in Artificial Intelligence (AI) raised the need for highly optimized and sophisticated AI frameworks. Starting with the Lua-based Torch many frameworks have emerged over time, such as Theano, Caffe, Chainer, CNTK, MxNet, PyTorch, DL4J, or TensorFlow. All of these provide a high level scripting API that allows users to easily design neural […]
May, 29
Fault Injection techniques for GPU Reliability Evaluation
A Graphical Processing Unit (GPU) is a computer chip that renders graphics and images by performing rapid mathematical calculations. In recent years, GPUs are exploited for reasons beyond graphics processing as General Purpose GPUs (GPGPUs); they work as hardware accelerators for high-performance computing in many different fields, including safety-critical applications. In these domains, Convolutional Neural […]
May, 29
Lossless Acceleration for Seq2seq Generation with Aggressive Decoding
We study lossless acceleration for seq2seq generation with a novel decoding algorithm — Aggressive Decoding. Unlike the previous efforts (e.g., non-autoregressive decoding) speeding up seq2seq generation at the cost of quality loss, our approach aims to yield the identical (or better) generation compared with autoregressive decoding but in a significant speedup, achieved by innovative cooperation […]
May, 22
The Application of AI Technology in GPU Scheduling Algorithm Optimization
With the rapid development of integrated circuit technology, GPU computing capabilities continue to improve. Due to the continuous improvement and improvement of GPU programming capabilities, functions, and performance, GPUs have been widely used in the field of high-tech general-purpose computers. This article is aimed at studying the optimization of GPU scheduling algorithm based on AI […]
May, 22
Blockchain Goes Green? Part II: Characterizing the Performance and Cost of Blockchains on the Cloud and at the Edge
While state-of-the-art permissioned blockchains can achieve thousands of transactions per second on commodity hardware with x86/64 architecture, their performance when running on different architectures is not clear. The goal of this work is to characterize the performance and cost of permissioned blockchains on different hardware systems, which is important as diverse application domains are adopting […]
May, 22
GPU Ray Tracing with Monte Carlo Methods
Monte Carlo methods are various techniques aimed at obtaining numerical results through simulations with random samples: the base idea of Monte Carlo methods is to generate a sequence of random numbers and execute the same algorithm on each one of them or in groups, then the resulting outputs are combined to obtain the final result. […]
May, 22
AnySeq/GPU: A Novel Approach for Faster Sequence Alignment on GPUs
In recent years, the rapidly increasing number of reads produced by next-generation sequencing (NGS) technologies has driven the demand for efficient implementations of sequence alignments in bioinformatics. However, current state-of-the-art approaches are not able to leverage the massively parallel processing capabilities of modern GPUs with close-to-peak performance. We present AnySeq/GPU-a sequence alignment library that augments […]
May, 22
Real-time semantic segmentation on FPGAs for autonomous vehicles with hls4ml
In this paper, we investigate how field programmable gate arrays can serve as hardware accelerators for real-time semantic segmentation tasks relevant for autonomous driving. Considering compressed versions of the ENet convolutional neural network architecture, we demonstrate a fully-on-chip deployment with a latency of 4.9 ms per image, using less than 30% of the available resources […]
May, 15
GPU-based JSON data processing using structural indexes
In recent years, large amounts of data are being increasingly generated and stored every day. Big data is often processed by different software systems, which require a common data interchange format. JavaScript Object Notation, or JSON, is one of the most popular data exchange formats and is widely used in web and data-intensive applications. Unfortunately, […]
May, 15
SYCLops: A SYCL Specific LLVM to MLIR Converter
There is a growing need for higher level abstractions for device kernels in heterogeneous environments, and the multi-level nature of the MLIR infrastructure perfectly addresses this requirement. As SYCL begins to gain industry adoption for heterogeneous applications and MLIR continues to develop, we present SYCLops: a converter capable of translating SYCL specific LLVM IR to […]
May, 15
Can We Run in Parallel? Automating Loop Parallelization for TornadoVM
With the advent of multi-core systems, GPUs and FPGAs, loop parallelization has become a promising way to speed-up program execution. In order to stay up with time, various performance-oriented programming languages provide a multitude of constructs to allow programmers to write parallelizable loops. Correspondingly, researchers have developed techniques to automatically parallelize loops that do not […]
May, 15
Productive Performance Engineering for Weather and Climate Modeling with Python
Earth system models are developed with a tight coupling to target hardware, often containing highly-specialized code predicated on processor characteristics. This coupling stems from using imperative languages that hard-code computation schedules and layout. In this work, we present a detailed account of optimizing the Finite Volume Cubed-Sphere (FV3) weather model, improving productivity and performance. By […]