high performance computing on graphics processing units: hgpu.org

Posts

Jan, 7

Domain-Specific Code Language Models: Unraveling the Potential for HPC Codes and Tasks

With easier access to powerful compute resources, there is a growing trend in AI for software development to develop larger language models (LLMs) to address a variety of programming tasks. Even LLMs applied to tasks from the high-performance computing (HPC) domain are huge in size and demand expensive compute resources for training. This is partly […]

Jan, 7

An Autonomous Data Language

Nowadays, the main advances in computational power are due to parallelism. However, most parallel languages have been designed with a focus on processors and threads. This makes dealing with data and memory in programs hard, which distances the implementation from its original algorithm. We propose a new paradigm for parallel programming, the data-autonomous paradigm, where […]

CUDA

•

OpenCL

Jan, 7

Deep Learning Workload Scheduling in GPU Datacenters: A Survey

Deep learning (DL) has demonstrated its remarkable success in a wide variety of fields. The development of a DL model is a time-consuming and resource-intensive procedure. Hence, dedicated GPU accelerators have been collectively constructed into a GPU datacenter. An efficient scheduler design for a GPU datacenter is crucially important to reduce operational cost and improve […]

Dec, 31

Adding fault tolerance to OpenCL: Through redundant heterogeneous computing

The ever-increasing demand for computing has led to the need for specialized heterogeneous hardware, and the frameworks required to utilize them. Besides the traditional central processing units, more and more programs will make use of specialized hardware to accelerate computations. However, the increase in computing also leads to shorter mean time between failures. In this […]

OpenCL

Dec, 31

Optimization of Ported CFD Kernels on Intel Data Center GPU Max 1550 using oneAPI ESIMD

We describe our experience porting FUN3D’s CUDA-optimized kernels to Intel oneAPI SYCL. We faced several challenges, including foremost the suboptimal performance of the oneAPI code on Intel’s new data center GPU. Suboptimal performance of the oneAPI code was due primarily to high register spills, memory latency, and poor vectorization. We addressed these issues by implementing […]

CUDA

Dec, 31

Performance Evaluation of Heterogeneous GPU Programming Frameworks for Hemodynamic Simulations

Preparing for the deployment of large scientific and engineering codes on upcoming exascale systems with GPU-dense nodes is made challenging by the unprecedented diversity of device architectures and heterogeneous programming models. In this work, we evaluate the process of porting a massively parallel, fluid dynamics code written in CUDA to SYCL, HIP, and Kokkos with […]

CUDA

Dec, 31

Enabling Quantum Computer Simulations on AMD GPUs: a HIP Backend for Google’s qsim

Quantum computer simulators play a critical role in supporting the development and validation of quantum algorithms and hardware. This study focuses on porting Google’s qsim, a quantum computer simulator, to AMD Graphics Processing Units (GPUs). We leverage the existing qsim CUDA backend and harness the HIPIFY tool to provide a qsim HIP backend tailored for […]

CUDA

Dec, 31

Gaiwan: a Size-Polymorphic Typesystem for GPU Programs

General-purpose computing on graphics processing units (GPGPU) is increasingly used for number crunching tasks such as analyzing time series data. GPUs are a good fit for these tasks as they can execute many computations in parallel. To leverage this parallelism, the programmer is forced to carefully divide their input data into data blocks that are […]

OpenCL

Dec, 24

Experiences Building an MLIR-based SYCL Compiler

Similar to other programming models, compilers for SYCL, the open programming model for heterogeneous computing based on C++, would benefit from access to higher-level intermediate representations. The loss of high-level structure and semantics caused by premature lowering to low-level intermediate representations and the inability to reason about host and device code simultaneously present major challenges […]

Dec, 24

KeSCo: Compiler-based Kernel Scheduling for Multi-task GPU Applications

Nowadays, Graphics Processing Units (GPUs) dominate in a wide spectrum of computing realms and multi-task is increasingly applied in various complicated applications. To gain higher performance, multi-task programs require cumbersome programming efforts to take advantage of inter-kernel concurrency at source-code level. Although there exist works automatically scheduling kernels to enable inter-kernel concurrency, they all inevitably […]

CUDA

Dec, 24

FCBench: Cross-Domain Benchmarking of Lossless Compression for Floating-Point Data

While both the database and high-performance computing (HPC) communities utilize lossless compression methods to minimize floating-point data size, a disconnect persists between them. Each community designs and assesses methods in a domain-specific manner, making it unclear if HPC compression techniques can benefit database applications or vice versa. With the HPC community increasingly leaning towards in-situ […]

CUDA

Dec, 24

Binary Code Summarization: Benchmarking ChatGPT/GPT-4 and Other Large Language Models

Binary code summarization, while invaluable for understanding code semantics, is challenging due to its labor-intensive nature. This study delves into the potential of large language models (LLMs) for binary code comprehension. To this end, we present BinSum, a comprehensive benchmark and dataset of over 557K binary functions and introduce a novel method for prompt synthesis […]

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Domain-Specific Code Language Models: Unraveling the Potential for HPC Codes and Tasks

An Autonomous Data Language

Deep Learning Workload Scheduling in GPU Datacenters: A Survey

Adding fault tolerance to OpenCL: Through redundant heterogeneous computing

Optimization of Ported CFD Kernels on Intel Data Center GPU Max 1550 using oneAPI ESIMD

Performance Evaluation of Heterogeneous GPU Programming Frameworks for Hemodynamic Simulations

Enabling Quantum Computer Simulations on AMD GPUs: a HIP Backend for Google’s qsim

Gaiwan: a Size-Polymorphic Typesystem for GPU Programs

Experiences Building an MLIR-based SYCL Compiler

KeSCo: Compiler-based Kernel Scheduling for Multi-task GPU Applications

FCBench: Cross-Domain Benchmarking of Lossless Compression for Floating-Point Data

Binary Code Summarization: Benchmarking ChatGPT/GPT-4 and Other Large Language Models

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)