high performance computing on graphics processing units: hgpu.org

Posts

May, 31

Evaluating the performance of HPC-style SYCL applications

SYCL is a parallel programming model for developing single-source programs for running on heterogeneous platforms. To this end, it allows for one code to be written which can run on a different architectures. For this study, we develop applications in SYCL which are representative of those often used in High-Performance Computing. Their performance is benchmarked […]

OpenCL

May, 31

Lessons learned in a decade of research software engineering GPU applications

After years of using Graphics Processing Units (GPUs) to accelerate scientific applications in fields as varied as tomography, computer vision, climate modeling, digital forensics, geospatial databases, particle physics, radio astronomy, and localization microscopy, we noticed a number of technical, socio-technical, and non-technical challenges that Research Software Engineers (RSEs) may run into. While some of these […]

CUDA

•

OpenCL

May, 31

Character-level Transformer-based Neural Machine Translation

Neural machine translation (NMT) is nowadays commonly applied at the subword level, using byte-pair encoding. A promising alternative approach focuses on character-level translation, which simplifies processing pipelines in NMT considerably. This approach, however, must consider relatively longer sequences, rendering the training process prohibitively expensive. In this paper, we discuss a novel, Transformer-based approach, that we […]

CUDA

May, 31

Brief Announcement: On the Limits of Parallelizing Convolutional Neural Networks on GPUs

GPUs are currently the platform of choice for training neural networks. However, training a deep neural network (DNN) is a time-consuming process even on GPUs because of the massive number of parameters that have to be learned. As a result, accelerating DNN training has been an area of significant research in the last couple of […]

CUDA

May, 31

HetPipe: Enabling Large DNN Training on (Whimpy) Heterogeneous GPU Clusters through Integration of Pipelined Model Parallelism and Data Parallelism

Deep Neural Network (DNN) models have continuously been growing in size in order to improve the accuracy and quality of the models. Moreover, for training of large DNN models, the use of heterogeneous GPUs is inevitable due to the short release cycle of new GPU architectures. In this paper, we investigate how to enable training […]

CUDA

May, 24

HaoCL: Harnessing Large-scale Heterogeneous Processors Made Easy

The pervasive adoption of Deep Learning (DL) and Graph Processing (GP) makes it a de facto requirement to build large-scale clusters of heterogeneous accelerators including GPUs and FPGAs. The OpenCL programming framework can be used on the individual nodes of such clusters but is not intended for deployment in a distributed manner. Fortunately, the original […]

OpenCL

May, 24

SYCL-Bench: A Versatile Single-Source Benchmark Suite for Heterogeneous Computing

SYCL is a royalty-free open standard from the Khronos group that enables heterogeneous programming for a broad range of parallel devices, including multicore CPUs, GPUs, and FPGAs. SYCL relies on pure C++, without any additional attributes or pragmas. While SYCL kernels follow a data-parallel model, they are implicitly organized in a task graph built by […]

May, 24

PDAWL: Profile-based Iterative Dynamic Adaptive WorkLoad Balance on Heterogeneous Architectures

While High Performance Computing systems are increasingly based on heterogeneous cores, their effectiveness depends on how well the scheduler can allocate workloads onto appropriate computing devices and how communication and computation can be overlapped. With different types of resources integrated into one system, the complexity of the scheduler correspondingly increases. Moreover, for applications with varying […]

CUDA

May, 24

Literature Review and Implementation Overview: High Performance Computing with Graphics Processing Units for Classroom and Research Use

In this report, I discuss the history and current state of GPU HPC systems. Although high-power GPUs have only existed a short time, they have found rapid adoption in deep learning applications. I also discuss an implementation of a commodity-hardware NVIDIA GPU HPC cluster for deep learning research and academic teaching use.

CUDA

May, 24

BlaBla: Linguistic Feature Extraction for Clinical Analysis in Multiple Languages

We introduce BlaBla, an open-source Python library for extracting linguistic features with proven clinical relevance to neurological and psychiatric diseases across many languages. BlaBla is a unifying framework for accelerating and simplifying clinical linguistic research. The library is built on state-of-the-art NLP frameworks and supports multithreaded/GPU-enabled feature extraction via both native Python calls and a […]

May, 17

Deep Learning with GO

Current research in deep learning is primarily focused on using Python as a support language. Go, an emerging language, that has many benefits including native support for concurrency has seen a rise in adoption over the past few years. However, this language is not widely used to develop learning models due to the lack of […]

CUDA

May, 17

Employing OpenCL as a Standard Hardware Abstraction in a Distributed Embedded System: A Case Study

The open computing language (OpenCL) is a standard open source specification for parallel computing on heterogeneous architectures. OpenCL offers a set of abstract models for substantial acceleration in parallel computing and is supported by most of the leading hardware vendors. In this paper, we present a systematic approach for employing OpenCL as a hardware abstraction […]

OpenCL

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Evaluating the performance of HPC-style SYCL applications

Lessons learned in a decade of research software engineering GPU applications

Character-level Transformer-based Neural Machine Translation

Brief Announcement: On the Limits of Parallelizing Convolutional Neural Networks on GPUs

HetPipe: Enabling Large DNN Training on (Whimpy) Heterogeneous GPU Clusters through Integration of Pipelined Model Parallelism and Data Parallelism

HaoCL: Harnessing Large-scale Heterogeneous Processors Made Easy

SYCL-Bench: A Versatile Single-Source Benchmark Suite for Heterogeneous Computing

PDAWL: Profile-based Iterative Dynamic Adaptive WorkLoad Balance on Heterogeneous Architectures

Literature Review and Implementation Overview: High Performance Computing with Graphics Processing Units for Classroom and Research Use

BlaBla: Linguistic Feature Extraction for Clinical Analysis in Multiple Languages

Deep Learning with GO

Employing OpenCL as a Standard Hardware Abstraction in a Distributed Embedded System: A Case Study

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)