28560

Posts

Aug, 20

Generating Parallel OpenCL and OpenMP Programs from Dataflow Graphs

This thesis describes and analyzes the automatic generation of threads from a sequential MiniC program by translating the program to an equivalent dataflow graph and partitioning this dataflow graph. These threads are generated through different graph partitionings, including splitting the graph into its single nodes and calculating a minimum vertex-disjoint cover. The threads can be […]
Aug, 20

Porting Batched Iterative Solvers onto Intel GPUs with SYCL

Batched linear solvers play a vital role in computational sciences, especially in the fields of plasma physics and combustion simulations. With the imminent deployment of the Aurora Supercomputer and other upcoming systems equipped with Intel GPUs, there is a compelling demand to expand the capabilities of these solvers for Intel GPU architectures. In this paper, […]
Aug, 20

APACE: AlphaFold2 and advanced computing as a service for accelerated discovery in biophysics

The prediction of protein 3D structure from amino acid sequence is a computational grand challenge in biophysics, and plays a key role in robust protein structure prediction algorithms, from drug discovery to genome interpretation. The advent of AI models, such as AlphaFold, is revolutionizing applications that depend on robust protein structure prediction algorithms. To maximize […]
Aug, 20

Increased reliability on Intel GPUs via software diverse redundancy

During the past decade, the industry revolutionized its processes by including Artificial Intelligence. Nowadays, this revolutionary process extends from the manufacturing industry to more critical sectors, such as the avionics, automotive, or health industry, where errors are unacceptable. One clear example of this process is the automotive industry, where the installation of Advanced Driver Assistance […]
Aug, 20

Quantifying OpenMP: Statistical Insights into Usage and Adoption

In high-performance computing (HPC), the demand for efficient parallel programming models has grown dramatically since the end of Dennard Scaling and the subsequent move to multi-core CPUs. OpenMP stands out as a popular choice due to its simplicity and portability, offering a directive-driven approach for shared-memory parallel programming. Despite its wide adoption, however, there is […]
Aug, 13

gZCCL: Compression-Accelerated Collective Communication Framework for GPU Clusters

GPU-aware collective communication has become a major bottleneck for modern computing platforms as GPU computing power rapidly rises. To address this issue, traditional approaches integrate lossy compression directly into GPU-aware collectives, which still suffer from serious issues such as underutilized GPU devices and uncontrolled data distortion. In this paper, we propose gZCCL, a general framework […]
Aug, 13

A Model Extraction Attack on Deep Neural Networks Running on GPUs

Deep Neural Networks (DNNs) have become ubiquitous due to their performance on prediction and classification problems. However, they face a variety of threats as their usage spreads. Model extraction attacks, which steal DNN models, endanger intellectual property, data privacy, and security. Previous research has shown that system-level side channels can be used to leak the […]
Aug, 13

SYnergy: Fine-grained Energy-Efficient Heterogeneous Computing for Scalable Energy Saving

Energy-efficient computing uses power management techniques such as frequency scaling to save energy. Implementing energy-efficient techniques on large-scale computing systems is challenging for several reasons. While most modern architectures, including GPUs, are capable of frequency scaling, these features are often not available on large systems. In addition, achieving higher energy savings requires precise energy tuning […]
Aug, 13

Static and Dynamic Analyses for Efficient GPU Execution

In this thesis we describe a host of static and dynamic techniques for efficient execution of GPU programs. Most significant is the array short-circuiting technique, which automatically rewrites array updates and concatenations to happen in-place when deemed safe. The optimization is based on FunMem, an intermediate representation with non-semantic memory information that we also introduce. […]
Aug, 13

Isolated Scheduling for Distributed Training Tasks in GPU Clusters

Distributed machine learning (DML) technology makes it possible to train large neural networks in a reasonable amount of time. Meanwhile, as the computing power grows much faster than network capacity, network communication has gradually become the bottleneck of DML. Current multi-tenant GPU clusters face network contention caused by hash-collision problem which not only further increases […]
Jul, 30

Monadic Deep Learning

The Java and Scala community has built a very successful big data ecosystem. However, most of neural networks running on it are modeled in dynamically typed programming languages. These dynamically typed deep learning frameworks treat neural networks as differentiable expressions that contain many trainable variable, and perform automatic differentiation on those expressions when training them. […]
Jul, 30

Bandicoot: C++ Library for GPU Linear Algebra and Scientific Computing

This report provides an introduction to the Bandicoot C++ library for GPU linear algebra and scientific computing, detailing its user interface and performance characteristics as well as the technical details of its internal design. Bandicoot is the GPU-enabled counterpart to the well-known Armadillo C++ linear algebra library, aimed at allowing users to enable GPU computation […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: