high performance computing on graphics processing units: hgpu.org

Posts

Mar, 10

Distributed OpenMP Offloading of OpenMC on Intel GPU MAX Accelerators

Monte Carlo (MC) simulations play a pivotal role in diverse scientific and engineering domains, with applications ranging from nuclear physics to materials science. Harnessing the computational power of high-performance computing (HPC) systems, especially Graphics Processing Units (GPUs), has become essential for accelerating MC simulations. This paper focuses on the adaptation and optimization of the OpenMC […]

Mar, 10

Hybrid quantum programming with PennyLane Lightning on HPC platforms

We introduce PennyLane’s Lightning suite, a collection of high-performance state-vector simulators targeting CPU, GPU, and HPC-native architectures and workloads. Quantum applications such as QAOA, VQE, and synthetic workloads are implemented to demonstrate the supported classical computing architectures and showcase the scale of problems that can be simulated using our tooling. We benchmark the performance of […]

Mar, 10

SYCL-Bench 2020: Benchmarking SYCL 2020 on AMD, Intel, and NVIDIA GPUs

Today, the SYCL standard represents the most advanced programming model for heterogeneous computing, delivering both productivity, portability, and performance in pure C++17. SYCL 2020, in particular, represents a major enhancement that pushes the boundaries of heterogeneous programming by introducing a number of new features. As the new features are implemented by existing compilers, it becomes […]

Mar, 10

Parallel Implementation of Lightweight Secure Hash Algorithm on CPU and GPU Environments

Currently, cryptographic hash functions are widely used in various applications, including message authentication codes, cryptographic random generators, digital signatures, key derivation functions, and post-quantum algorithms. Notably, they play a vital role in establishing secure communication between servers and clients. Specifically, servers often need to compute a large number of hash functions simultaneously to provide smooth […]

CUDA

Mar, 3

Low-Overhead Trace Collection and Profiling on GPU Compute Kernels

While GPUs can bring substantial speedup to compute-intensive tasks, their programming is notoriously hard. From their programming model, to microarchitectural particularities, the programmer may encounter many pitfalls which may hinder performance in obscure ways. Numerous performance analysis tools provide helpful data on the efficiency of the compute kernels, but few allow the programmer to efficiently […]

CUDA

Mar, 3

Using AI libraries for Incompressible Computational Fluid Dynamics

Recently, there has been a huge effort focused on developing highly efficient open source libraries to perform Artificial Intelligence (AI) related computations on different computer architectures (for example, CPUs, GPUs and new AI processors). This has not only made the algorithms based on these libraries highly efficient and portable between different architectures, but also has […]

CUDA

Mar, 3

Sustainable Supercomputing for AI: GPU Power Capping at HPC Scale

As research and deployment of AI grows, the computational burden to support and sustain its progress inevitably does too. To train or fine-tune state-of-the-art models in NLP, computer vision, etc., some form of AI hardware acceleration is virtually a requirement. Recent large language models require considerable resources to train and deploy, resulting in significant energy […]

Mar, 3

Spyx: A Library for Just-In-Time Compiled Optimization of Spiking Neural Networks

As the role of artificial intelligence becomes increasingly pivotal in modern society, the efficient training and deployment of deep neural networks have emerged as critical areas of focus. Recent advancements in attention-based large neural architectures have spurred the development of AI accelerators, facilitating the training of extensive, multi-billion parameter models. Despite their effectiveness, these powerful […]

CUDA

Mar, 3

Parallel programming in mobile devices with FancyJCL

Mobile devices and handheld systems, such as the smartphones and tablets universally extended, are becoming increasingly powerful. Their basic hardware configuration is usually state-of-the-art heterogeneous architectures consisting of multi-core processors and some kind of accelerator such as GPUs or DSPs. Specific code adapted to the architecture is mandatory if high-performance computation is required and low-level […]

OpenCL

Feb, 25

APPy: Annotated Parallelism for Python on GPUs

GPUs are increasingly being used used to speed up Python applications in the scientific computing and machine learning domains. Currently, the two common approaches to leveraging GPU acceleration in Python are 1) create a custom native GPU kernel, and import it as a function that can be called from Python; 2) use libraries such as […]

CUDA

Feb, 25

Analyzing GPU Performance in Virtualized Environments: A Case Study

The graphics processing unit (GPU) plays a crucial role in boosting application performance and enhancing computational tasks. Thanks to its parallel architecture and energy efficiency, the GPU has become essential in many computing scenarios. On the other hand, the advent of GPU virtualization has been a significant breakthrough, as it provides scalable and adaptable GPU […]

CUDA

•

OpenCL

Feb, 25

Assessing opportunities of SYCL for biological sequence alignment on GPU-based systems

Bioinformatics and computational biology are two fields that have been exploiting GPUs for more than two decades, with being CUDA the most used programming language for them. However, as CUDA is an NVIDIA proprietary language, it implies a strong portability restriction to a wide range of heterogeneous architectures, like AMD or Intel GPUs. To face […]

CUDA