Posts
Aug, 13
gZCCL: Compression-Accelerated Collective Communication Framework for GPU Clusters
GPU-aware collective communication has become a major bottleneck for modern computing platforms as GPU computing power rapidly rises. To address this issue, traditional approaches integrate lossy compression directly into GPU-aware collectives, which still suffer from serious issues such as underutilized GPU devices and uncontrolled data distortion. In this paper, we propose gZCCL, a general framework […]
Aug, 13
A Model Extraction Attack on Deep Neural Networks Running on GPUs
Deep Neural Networks (DNNs) have become ubiquitous due to their performance on prediction and classification problems. However, they face a variety of threats as their usage spreads. Model extraction attacks, which steal DNN models, endanger intellectual property, data privacy, and security. Previous research has shown that system-level side channels can be used to leak the […]
Aug, 13
SYnergy: Fine-grained Energy-Efficient Heterogeneous Computing for Scalable Energy Saving
Energy-efficient computing uses power management techniques such as frequency scaling to save energy. Implementing energy-efficient techniques on large-scale computing systems is challenging for several reasons. While most modern architectures, including GPUs, are capable of frequency scaling, these features are often not available on large systems. In addition, achieving higher energy savings requires precise energy tuning […]
Aug, 13
Static and Dynamic Analyses for Efficient GPU Execution
In this thesis we describe a host of static and dynamic techniques for efficient execution of GPU programs. Most significant is the array short-circuiting technique, which automatically rewrites array updates and concatenations to happen in-place when deemed safe. The optimization is based on FunMem, an intermediate representation with non-semantic memory information that we also introduce. […]
Aug, 13
Isolated Scheduling for Distributed Training Tasks in GPU Clusters
Distributed machine learning (DML) technology makes it possible to train large neural networks in a reasonable amount of time. Meanwhile, as the computing power grows much faster than network capacity, network communication has gradually become the bottleneck of DML. Current multi-tenant GPU clusters face network contention caused by hash-collision problem which not only further increases […]
Jul, 30
Monadic Deep Learning
The Java and Scala community has built a very successful big data ecosystem. However, most of neural networks running on it are modeled in dynamically typed programming languages. These dynamically typed deep learning frameworks treat neural networks as differentiable expressions that contain many trainable variable, and perform automatic differentiation on those expressions when training them. […]
Jul, 30
Bandicoot: C++ Library for GPU Linear Algebra and Scientific Computing
This report provides an introduction to the Bandicoot C++ library for GPU linear algebra and scientific computing, detailing its user interface and performance characteristics as well as the technical details of its internal design. Bandicoot is the GPU-enabled counterpart to the well-known Armadillo C++ linear algebra library, aimed at allowing users to enable GPU computation […]
Jul, 30
Efficiency without Tears: Securing Multilingual Programs with TRINITY
Despite the fact that most real-world programs are developed in multiple languages in the era of data science, existing security techniques are still limited to single-language programs. Worse yet, languages designed for high-performance computing often ignore the necessary security checking in foreign function interfaces (FFI) to pursue supreme execution efficiency. In consequence, security flaws and […]
Jul, 30
Fast Knowledge Graph Completion using Graphics Processing Units
Knowledge graphs can be used in many areas related to data semantics such as question-answering systems, knowledge based systems. However, the currently constructed knowledge graphs need to be complemented for better knowledge in terms of relations. It is called knowledge graph completion. To add new relations to the existing knowledge graph by using knowledge graph […]
Jul, 30
A portable C++ library for memory and compute abstraction on multi-core CPUs and GPUs
We present a C++ library for transparent memory and compute abstraction across CPU and GPU architectures. Our library combines generic data structures like vectors, multi-dimensional arrays, maps, graphs, and sparse grids with basic generic algorithms like arbitrary-dimensional convolutions, copying, merging, sorting, prefix sum, reductions, neighbor search, and filtering. The memory layout of the data structures […]
Jul, 24
ProtoX: A First Look
We present a first look at ProtoX, a code generation framework for stencil and pointwise operations that occur frequently in the numerical solution of partial differential equations. ProtoX has Proto as its library frontend and SPIRAL as the backend. Proto is a C++ based domain specific library which optimizes the algorithms used to compute the […]
Jul, 24
qecGPT: decoding Quantum Error-correcting Codes with Generative Pre-trained Transformers
We propose a general framework for decoding quantum error-correcting codes with generative modeling. The model utilizes autoregressive neural networks, specifically Transformers, to learn the joint probability of logical operators and syndromes. This training is in an unsupervised way, without the need for labeled training data, and is thus referred to as pre-training. After the pre-training, […]