28958

Posts

Jan, 14

Code Generation for a Variety of Accelerators for a Graph DSL

Sparse graphs are ubiquitous in real and virtual worlds. With the phenomenal growth in semi-structured and unstructured data, sizes of the underlying graphs have witnessed a rapid growth over the years. Analyzing such large structures necessitates parallel processing, which is challenged by the intrinsic irregularity of sparse computation, memory access, and communication. It would be […]
Jan, 7

Deep Learning for Obfuscated Code Analysis

Modern software development relies increasingly on third-party code dependencies, which enables rapid development but also increases risk of introducing bugs, malware, or unauthorized intellectual property. The goal of this dissertation is to reduce these risks making them easier to detect. Determining the meaning of an arbitrary program reduces to solving the halting problem, which is […]
Jan, 7

UniFL: Accelerating Federated Learning Using Heterogeneous Hardware Under a Unified Framework

Federated learning (FL) is now considered a critical method for breaking down data silos. However, data encryption can significantly increase computing time, limiting its large-scale deployment. While hardware acceleration can be an effective solution, existing research has largely focused on a single hardware type, which hinders the acceleration of FL across the various heterogeneous hardware […]
Jan, 7

Domain-Specific Code Language Models: Unraveling the Potential for HPC Codes and Tasks

With easier access to powerful compute resources, there is a growing trend in AI for software development to develop larger language models (LLMs) to address a variety of programming tasks. Even LLMs applied to tasks from the high-performance computing (HPC) domain are huge in size and demand expensive compute resources for training. This is partly […]
Jan, 7

An Autonomous Data Language

Nowadays, the main advances in computational power are due to parallelism. However, most parallel languages have been designed with a focus on processors and threads. This makes dealing with data and memory in programs hard, which distances the implementation from its original algorithm. We propose a new paradigm for parallel programming, the data-autonomous paradigm, where […]
Jan, 7

Deep Learning Workload Scheduling in GPU Datacenters: A Survey

Deep learning (DL) has demonstrated its remarkable success in a wide variety of fields. The development of a DL model is a time-consuming and resource-intensive procedure. Hence, dedicated GPU accelerators have been collectively constructed into a GPU datacenter. An efficient scheduler design for a GPU datacenter is crucially important to reduce operational cost and improve […]
Dec, 31

Adding fault tolerance to OpenCL: Through redundant heterogeneous computing

The ever-increasing demand for computing has led to the need for specialized heterogeneous hardware, and the frameworks required to utilize them. Besides the traditional central processing units, more and more programs will make use of specialized hardware to accelerate computations. However, the increase in computing also leads to shorter mean time between failures. In this […]
Dec, 31

Optimization of Ported CFD Kernels on Intel Data Center GPU Max 1550 using oneAPI ESIMD

We describe our experience porting FUN3D’s CUDA-optimized kernels to Intel oneAPI SYCL. We faced several challenges, including foremost the suboptimal performance of the oneAPI code on Intel’s new data center GPU. Suboptimal performance of the oneAPI code was due primarily to high register spills, memory latency, and poor vectorization. We addressed these issues by implementing […]
Dec, 31

Performance Evaluation of Heterogeneous GPU Programming Frameworks for Hemodynamic Simulations

Preparing for the deployment of large scientific and engineering codes on upcoming exascale systems with GPU-dense nodes is made challenging by the unprecedented diversity of device architectures and heterogeneous programming models. In this work, we evaluate the process of porting a massively parallel, fluid dynamics code written in CUDA to SYCL, HIP, and Kokkos with […]
Dec, 31

Enabling Quantum Computer Simulations on AMD GPUs: a HIP Backend for Google’s qsim

Quantum computer simulators play a critical role in supporting the development and validation of quantum algorithms and hardware. This study focuses on porting Google’s qsim, a quantum computer simulator, to AMD Graphics Processing Units (GPUs). We leverage the existing qsim CUDA backend and harness the HIPIFY tool to provide a qsim HIP backend tailored for […]
Dec, 31

Gaiwan: a Size-Polymorphic Typesystem for GPU Programs

General-purpose computing on graphics processing units (GPGPU) is increasingly used for number crunching tasks such as analyzing time series data. GPUs are a good fit for these tasks as they can execute many computations in parallel. To leverage this parallelism, the programmer is forced to carefully divide their input data into data blocks that are […]
Dec, 24

KeSCo: Compiler-based Kernel Scheduling for Multi-task GPU Applications

Nowadays, Graphics Processing Units (GPUs) dominate in a wide spectrum of computing realms and multi-task is increasingly applied in various complicated applications. To gain higher performance, multi-task programs require cumbersome programming efforts to take advantage of inter-kernel concurrency at source-code level. Although there exist works automatically scheduling kernels to enable inter-kernel concurrency, they all inevitably […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: