28959

Posts

Jan, 14

HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated Program Synthesis

Single-Program-Multiple-Data (SPMD) parallelism has recently been adopted to train large deep neural networks (DNNs). Few studies have explored its applicability on heterogeneous clusters, to fully exploit available resources for large model learning. This paper presents HAP, an automated system designed to expedite SPMD DNN training on heterogeneous clusters. HAP jointly optimizes the tensor sharding strategy, […]
Jan, 14

Code Generation for a Variety of Accelerators for a Graph DSL

Sparse graphs are ubiquitous in real and virtual worlds. With the phenomenal growth in semi-structured and unstructured data, sizes of the underlying graphs have witnessed a rapid growth over the years. Analyzing such large structures necessitates parallel processing, which is challenged by the intrinsic irregularity of sparse computation, memory access, and communication. It would be […]
Jan, 7

Deep Learning for Obfuscated Code Analysis

Modern software development relies increasingly on third-party code dependencies, which enables rapid development but also increases risk of introducing bugs, malware, or unauthorized intellectual property. The goal of this dissertation is to reduce these risks making them easier to detect. Determining the meaning of an arbitrary program reduces to solving the halting problem, which is […]
Jan, 7

UniFL: Accelerating Federated Learning Using Heterogeneous Hardware Under a Unified Framework

Federated learning (FL) is now considered a critical method for breaking down data silos. However, data encryption can significantly increase computing time, limiting its large-scale deployment. While hardware acceleration can be an effective solution, existing research has largely focused on a single hardware type, which hinders the acceleration of FL across the various heterogeneous hardware […]
Jan, 7

Domain-Specific Code Language Models: Unraveling the Potential for HPC Codes and Tasks

With easier access to powerful compute resources, there is a growing trend in AI for software development to develop larger language models (LLMs) to address a variety of programming tasks. Even LLMs applied to tasks from the high-performance computing (HPC) domain are huge in size and demand expensive compute resources for training. This is partly […]
Jan, 7

An Autonomous Data Language

Nowadays, the main advances in computational power are due to parallelism. However, most parallel languages have been designed with a focus on processors and threads. This makes dealing with data and memory in programs hard, which distances the implementation from its original algorithm. We propose a new paradigm for parallel programming, the data-autonomous paradigm, where […]
Jan, 7

Deep Learning Workload Scheduling in GPU Datacenters: A Survey

Deep learning (DL) has demonstrated its remarkable success in a wide variety of fields. The development of a DL model is a time-consuming and resource-intensive procedure. Hence, dedicated GPU accelerators have been collectively constructed into a GPU datacenter. An efficient scheduler design for a GPU datacenter is crucially important to reduce operational cost and improve […]
Dec, 31

Adding fault tolerance to OpenCL: Through redundant heterogeneous computing

The ever-increasing demand for computing has led to the need for specialized heterogeneous hardware, and the frameworks required to utilize them. Besides the traditional central processing units, more and more programs will make use of specialized hardware to accelerate computations. However, the increase in computing also leads to shorter mean time between failures. In this […]
Dec, 31

Optimization of Ported CFD Kernels on Intel Data Center GPU Max 1550 using oneAPI ESIMD

We describe our experience porting FUN3D’s CUDA-optimized kernels to Intel oneAPI SYCL. We faced several challenges, including foremost the suboptimal performance of the oneAPI code on Intel’s new data center GPU. Suboptimal performance of the oneAPI code was due primarily to high register spills, memory latency, and poor vectorization. We addressed these issues by implementing […]
Dec, 31

Performance Evaluation of Heterogeneous GPU Programming Frameworks for Hemodynamic Simulations

Preparing for the deployment of large scientific and engineering codes on upcoming exascale systems with GPU-dense nodes is made challenging by the unprecedented diversity of device architectures and heterogeneous programming models. In this work, we evaluate the process of porting a massively parallel, fluid dynamics code written in CUDA to SYCL, HIP, and Kokkos with […]
Dec, 31

Enabling Quantum Computer Simulations on AMD GPUs: a HIP Backend for Google’s qsim

Quantum computer simulators play a critical role in supporting the development and validation of quantum algorithms and hardware. This study focuses on porting Google’s qsim, a quantum computer simulator, to AMD Graphics Processing Units (GPUs). We leverage the existing qsim CUDA backend and harness the HIPIFY tool to provide a qsim HIP backend tailored for […]
Dec, 31

Gaiwan: a Size-Polymorphic Typesystem for GPU Programs

General-purpose computing on graphics processing units (GPGPU) is increasingly used for number crunching tasks such as analyzing time series data. GPUs are a good fit for these tasks as they can execute many computations in parallel. To leverage this parallelism, the programmer is forced to carefully divide their input data into data blocks that are […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: