29482

Posts

Oct, 27

Fastrack: Fast IO for Secure ML using GPU TEEs

As cloud-based ML expands, ensuring data security during training and inference is critical. GPU-based Trusted Execution Environments (TEEs) offer secure, high-performance solutions, with CPU TEEs managing data movement and GPU TEEs handling authentication and computation. However, CPU-to-GPU communication overheads significantly hinder performance, as data must be encrypted, authenticated, decrypted, and verified, increasing costs by 12.69 […]
Oct, 27

General-Purpose Computing on Tensor Processors

Modern computer systems have become heterogeneous and consist of many emerging kinds of hardware accelerators as Dennard Scaling discontinues. Also, such domain-specific hardware accelerators fulfill the rapidly growing computing demands for applications including artificial intelligence (AI) and machine learning (ML). Beyond conventional computer components such as central processing units (CPUs) and memory, modern computers typically […]
Oct, 27

Using modern C++ to improve CUDA programs

The classic style of writing and porting HPC applications to the GPU uses pointers to buffers or data-structures as kernel parameters. This style discards type information, leading to “flattening” of CPU-side data-structures before using them as kernel parameters, followed by a need to reconstruct them in GPU code to retain flexibility. In this thesis, we […]
Oct, 20

Efficient Configuration of Heterogeneous Resources and Task Scheduling Strategies in Deep Learning Auto-Tuning Systems

Deep Learning Automatic Hyperparameter Tuning plays a crucial role in advancing Artificial Intelligence applications, eliminating the need for complex expertise and costly manual operations. Ray Tune, developed by the University of California, Berkeley, has gained widespread adoption among notable companies like Amazon and Uber. In contrast to large enterprises, the hardware commonly used by the […]
Oct, 20

Testing GPU Numerics: Finding Numerical Differences Between NVIDIA and AMD GPUs

As scientific codes are ported between GPU platforms, continuous testing is required to ensure numerical robustness and identify numerical differences. Compiler-induced numerical differences occur when a program is compiled and run on different GPUs, and the numerical outcomes are different for the same input. We present a study of compiler-induced numerical differences between NVIDIA and […]
Oct, 20

Online Energy Optimization in GPUs: A Multi-Armed Bandit Approach

Energy consumption has become a critical design metric and a limiting factor in the development of future computing architectures, from small wearable devices to large-scale leadership computing facilities. The predominant methods in energy management optimization are focused on CPUs. However, GPUs are increasingly significant and account for the majority of energy consumption in heterogeneous high […]
Oct, 20

Superpipeline: A Universal Approach for Reducing GPU Memory Usage in Large Models

The rapid growth in machine learning models, especially in natural language processing and computer vision, has led to challenges when running these models on hardware with limited resources. This paper introduces Superpipeline, a new framework designed to optimize the execution of large AI models on constrained hardware during both training and inference. Our approach involves […]
Oct, 20

Accelerating Drug Discovery in AutoDock-GPU with Tensor Cores

In drug discovery, molecular docking aims at characterizing the binding of a drug-like molecule to a macromolecule. AutoDock-GPU, a state-of-the-art docking software, estimates the geometrical conformation of a docked ligand-protein complex by minimizing a scoring function. Our profiling results indicate that the current reduction operation that is heavily used in the scoring function is sub-optimal. […]
Oct, 13

Optimized Code Generation for Parallel and Polyhedral Loop Nests using MLIR

In this thesis we show the benefits of the novel MLIR compiler technology to the generation of code from a DSL, namely EasyML used in openCARP, a widely used simulator in the cardiac electrophysiology community. Building on an existing work we deeply modified openCARP’s native code generator to enable efficient vectorized CPU and GPU code […]
Oct, 13

Deep Learning and Machine Learning with GPGPU and CUDA: Unlocking the Power of Parallel Computing

This book presents a comprehensive exploration of GPGPU (General Purpose Graphics Processing Unit) and its applications in deep learning and machine learning. It focuses on how parallel computing, particularly through the use of CUDA (Compute Unified Device Architecture), can unlock unprecedented computational power for complex tasks. The book provides detailed discussions on CPU and GPU […]
Oct, 13

Sound and Partially-Complete Static Analysis of Data-Races in GPU Programs

GPUs are progressively being integrated into modern society, playing a pivotal role in Artificial Intelligence and High-Performance Computing. Programmers need a deep understanding of the GPU programming model to avoid subtle data-races in their codes. Static verification that is sound and incomplete can guarantee data-race freedom, but the alarms it raises may be spurious and […]
Oct, 13

A domain-specific language for geospatial computations on the GPU

This thesis explores how a domain-specific language (DSL) for simple geospatial operators on the GPU can be developed, and evaluates the level of functionality and performance of such a DSL. The purpose of such a DSL is to simplify implementation of geospatial operators on the GPU, in order to increase productivity and performance. An embedded […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org