29142

Posts

Mar, 10

Hybrid quantum programming with PennyLane Lightning on HPC platforms

We introduce PennyLane’s Lightning suite, a collection of high-performance state-vector simulators targeting CPU, GPU, and HPC-native architectures and workloads. Quantum applications such as QAOA, VQE, and synthetic workloads are implemented to demonstrate the supported classical computing architectures and showcase the scale of problems that can be simulated using our tooling. We benchmark the performance of […]
Mar, 3

Using AI libraries for Incompressible Computational Fluid Dynamics

Recently, there has been a huge effort focused on developing highly efficient open source libraries to perform Artificial Intelligence (AI) related computations on different computer architectures (for example, CPUs, GPUs and new AI processors). This has not only made the algorithms based on these libraries highly efficient and portable between different architectures, but also has […]
Mar, 3

Sustainable Supercomputing for AI: GPU Power Capping at HPC Scale

As research and deployment of AI grows, the computational burden to support and sustain its progress inevitably does too. To train or fine-tune state-of-the-art models in NLP, computer vision, etc., some form of AI hardware acceleration is virtually a requirement. Recent large language models require considerable resources to train and deploy, resulting in significant energy […]
Mar, 3

Spyx: A Library for Just-In-Time Compiled Optimization of Spiking Neural Networks

As the role of artificial intelligence becomes increasingly pivotal in modern society, the efficient training and deployment of deep neural networks have emerged as critical areas of focus. Recent advancements in attention-based large neural architectures have spurred the development of AI accelerators, facilitating the training of extensive, multi-billion parameter models. Despite their effectiveness, these powerful […]
Mar, 3

Parallel programming in mobile devices with FancyJCL

Mobile devices and handheld systems, such as the smartphones and tablets universally extended, are becoming increasingly powerful. Their basic hardware configuration is usually state-of-the-art heterogeneous architectures consisting of multi-core processors and some kind of accelerator such as GPUs or DSPs. Specific code adapted to the architecture is mandatory if high-performance computation is required and low-level […]
Mar, 3

Low-Overhead Trace Collection and Profiling on GPU Compute Kernels

While GPUs can bring substantial speedup to compute-intensive tasks, their programming is notoriously hard. From their programming model, to microarchitectural particularities, the programmer may encounter many pitfalls which may hinder performance in obscure ways. Numerous performance analysis tools provide helpful data on the efficiency of the compute kernels, but few allow the programmer to efficiently […]
Feb, 25

APPy: Annotated Parallelism for Python on GPUs

GPUs are increasingly being used used to speed up Python applications in the scientific computing and machine learning domains. Currently, the two common approaches to leveraging GPU acceleration in Python are 1) create a custom native GPU kernel, and import it as a function that can be called from Python; 2) use libraries such as […]
Feb, 25

Analyzing GPU Performance in Virtualized Environments: A Case Study

The graphics processing unit (GPU) plays a crucial role in boosting application performance and enhancing computational tasks. Thanks to its parallel architecture and energy efficiency, the GPU has become essential in many computing scenarios. On the other hand, the advent of GPU virtualization has been a significant breakthrough, as it provides scalable and adaptable GPU […]
Feb, 25

Assessing opportunities of SYCL for biological sequence alignment on GPU-based systems

Bioinformatics and computational biology are two fields that have been exploiting GPUs for more than two decades, with being CUDA the most used programming language for them. However, as CUDA is an NVIDIA proprietary language, it implies a strong portability restriction to a wide range of heterogeneous architectures, like AMD or Intel GPUs. To face […]
Feb, 25

Green AI: A Preliminary Empirical Study on Energy Consumption in DL Models Across Different Runtime Infrastructures

Deep Learning (DL) frameworks such as PyTorch and TensorFlow include runtime infrastructures responsible for executing trained models on target hardware, managing memory, data transfers, and multi-accelerator execution, if applicable. Additionally, it is a common practice to deploy pre-trained models on environments distinct from their native development settings. This led to the introduction of interchange formats […]
Feb, 25

Benchmarking and Dissecting the Nvidia Hopper GPU Architecture

Graphics processing units (GPUs) are continually evolving to cater to the computational demands of contemporary general-purpose workloads, particularly those driven by artificial intelligence (AI) utilizing deep learning techniques. A substantial body of studies have been dedicated to dissecting the microarchitectural metrics characterizing diverse GPU generations, which helps researchers understand the hardware details and leverage them […]
Feb, 18

Graphtoy: Fast Software Simulation of Applications for AMD’s AI Engines

This work presents Graphtoy, a coroutine-based compute graph simulator built in C++20, which can be embedded into a target application for rapid step-by-step prototyping of graphs targeting AMD’s AI Engines, as used in Versal FPGAs and Ryzen 7040 CPUs. By using a molecular docking application as a case study, we demonstrate: 1) how compute graphs […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: