high performance computing on graphics processing units: hgpu.org

Posts

Apr, 21

Efficient Approaches for GEMM Acceleration on Leading AI-Optimized FPGAs

FPGAs are a promising platform for accelerating Deep Learning (DL) applications, due to their high performance, low power consumption, and reconfigurability. Recently, the leading FPGA vendors have enhanced their architectures to more efficiently support the computational demands of DL workloads. However, the two most prominent AI-optimized FPGAs, i.e., AMD/Xilinx Versal ACAP and Intel Stratix 10 […]

Apr, 21

Software Optimization and Orchestration for Heterogeneous and Distributed Architectures

In the context of the Edge-Cloud computing continuum, containerization and orchestration have become two key requirements in software development best practices. Containerization allows for better resource utilization, platform-independent development, and secure software deployment. Orchestration automates the deployment, networking, scaling, and availability of containerized workloads and services. However, there are still several open challenges. First, the […]

Apr, 21

SimSYCL: A SYCL Implementation Targeting Development, Debugging, Simulation and Conformance

The open SYCL standard has established itself as a cross-vendor, cross-platform means to develop software which benefits from GPU and accelerator parallelism. Inherent difficulties in portability between and debuggability of programs for these targets remain. However, as we demonstrate, the SYCL specification lends itself to be implemented purely in software in a manner that is […]

Apr, 21

Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey

With the rapid growth in the volume of data sets, models, and devices in the domain of deep learning, there is increasing attention on large-scale distributed deep learning. In contrast to traditional distributed deep learning, the large-scale scenario poses new challenges that include fault tolerance, scalability of algorithms and infrastructures, and heterogeneity in data sets, […]

CUDA

Apr, 21

Python-Based Quantum Chemistry Calculations with GPU Acceleration

To meet the increasing demand of quantum chemistry calculations in data-driven chemical research, the collaboration between industrial stakeholders and the quantum chemistry community has led to the development of GPU4PySCF, a GPU-accelerated Python package. This open-source project is accessible via its public GitHub repository. This paper outlines the primary features, innovations, and advantages of this […]

CUDA

Apr, 14

A Systematic Literature Survey of Sparse Matrix-Vector Multiplication

Sparse matrix-vector multiplication (SpMV) is a crucial computing kernel with widespread applications in iterative algorithms. Over the past decades, research on SpMV optimization has made remarkable strides, giving rise to various optimization contributions. However, the comprehensive and systematic literature survey that introduces, analyzes, discusses, and summarizes the advancements of SpMV in recent years is currently […]

Apr, 14

QArray: a GPU-accelerated constant capacitance model simulator for large quantum dot arrays

Semiconductor quantum dot arrays are a leading architecture for the development of quantum technologies. Over the years, the constant capacitance model has served as a fundamental framework for simulating, understanding, and navigating the charge stability diagrams of small quantum dot arrays. However, while the size of the arrays keeps growing, solving the constant capacitance model […]

Apr, 14

Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach

Reducing the need for users to manually manage the details of work and data distribution is an important goal of high-level many-task runtime systems. For distributed memory platforms this means that the runtime system has to keep track of both fine-grained task dependencies and data residency meta-information. The amount of such meta-information is proportional to […]

Apr, 14

OpenMP offload at the Exascale using Intel GPU Max 1550: evaluation of STREAmS compressible solver

Nearly 20 years after the birth of general purpose GPU computing, the HPC landscape is now dominated by GPUs. After years of undisputed dominance by NVIDIA, new players have entered the arena in a convincing manner, namely AMD and more recently Intel, whose devices currently power the first two clusters in the Top500 ranking. Unfortunately, […]

Apr, 14

High Performance Privacy Preserving AI

Artificial intelligence (AI) depends on data. In sensitive domains – such as healthcare, security, finance, and many more – there is therefore tension between unleashing the power of AI and maintaining the confidentiality and security of the relevant data. This book – intended for researchers in academia and R&D engineers in industry – explains how […]

Apr, 7

Seer: Predictive Runtime Kernel Selection for Irregular Problems

Modern GPUs are designed for regular problems and suffer from load imbalance when processing irregular data. Prior to our work, a domain expert selects the best kernel to map fine-grained irregular parallelism to a GPU. We instead propose Seer, an abstraction for producing a simple, reproduceable, and understandable decision tree selector model which performs runtime […]

Apr, 7

Using Intel oneAPI for Multi-hybrid Acceleration Programming with GPU and FPGA Coupling

Intel oneAPI is a programming framework that accepts various accelerators such as GPUs, FPGAs, and multi-core CPUs, with a focus on HPC applications. Users can apply their code written in a single language, DPC++, to this heterogeneous programming environment. However, in practice, it is not easy to apply to different accelerators, especially for non-Intel devices […]

CUDA

•

OpenCL

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Posts

Efficient Approaches for GEMM Acceleration on Leading AI-Optimized FPGAs

Software Optimization and Orchestration for Heterogeneous and Distributed Architectures

SimSYCL: A SYCL Implementation Targeting Development, Debugging, Simulation and Conformance

Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey

Python-Based Quantum Chemistry Calculations with GPU Acceleration

A Systematic Literature Survey of Sparse Matrix-Vector Multiplication

QArray: a GPU-accelerated constant capacitance model simulator for large quantum dot arrays

Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach

OpenMP offload at the Exascale using Intel GPU Max 1550: evaluation of STREAmS compressible solver

High Performance Privacy Preserving AI

Seer: Predictive Runtime Kernel Selection for Irregular Problems

Using Intel oneAPI for Multi-hybrid Acceleration Programming with GPU and FPGA Coupling

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)