Posts
Mar, 18
Fast Truncated SVD of Sparse and Dense Matrices on Graphics Processors
We investigate the solution of low-rank matrix approximation problems using the truncated SVD. For this purpose, we develop and optimize GPU implementations for the randomized SVD and a blocked variant of the Lanczos approach. Our work takes advantage of the fact that the two methods are composed of very similar linear algebra building blocks, which […]
Mar, 18
MUPPET: Optimizing Performance in OpenMP via Mutation Testing
Performance optimization continues to be a challenge in modern HPC software. Existing performance optimization techniques, including profiling-based and auto-tuning techniques, fail to indicate program modifications at the source level thus preventing their portability across compilers. This paper describes Muppet, a new approach that identifies program modifications called mutations aimed at improving program performance. Muppet’s mutations […]
Mar, 18
SYCL in the edge: performance and energy evaluation for heterogeneous acceleration
Edge computing is essential to handle increasing data volumes and processing capacities. It provides real-time and secure data processing near data sources, like smart devices, alleviating cloud computing energy use, and saving network bandwidth. Specialized accelerators, like GPUs and FPGAs, are vital for low-latency edge computing but the requirements to customized code for different hardware […]
Mar, 18
Predicting GPUDirect Benefits for HPC Workloads
Graphics processing units (GPUs) are becoming increasingly popular in modern HPC systems. Hardware for data movement to and from GPUs such as NVLink and GPUDirect has reduced latencies, increased throughput, and eliminated redundant copies. In this work, we use discrete event simulations to explore the impact of different communication paradigms on the messaging performance of […]
Mar, 10
FTTN: Feature-Targeted Testing for Numerical Properties of NVIDIA & AMD Matrix Accelerators
NVIDIA Tensor Cores and AMD Matrix Cores (together called Matrix Accelerators) are of growing interest in high-performance computing and machine learning owing to their high performance. Unfortunately, their numerical behaviors are not publicly documented, including the number of extra precision bits maintained, the accumulation order of addition, and predictable subnormal number handling during computations. This […]
Mar, 10
Distributed OpenMP Offloading of OpenMC on Intel GPU MAX Accelerators
Monte Carlo (MC) simulations play a pivotal role in diverse scientific and engineering domains, with applications ranging from nuclear physics to materials science. Harnessing the computational power of high-performance computing (HPC) systems, especially Graphics Processing Units (GPUs), has become essential for accelerating MC simulations. This paper focuses on the adaptation and optimization of the OpenMC […]
Mar, 10
Hybrid quantum programming with PennyLane Lightning on HPC platforms
We introduce PennyLane’s Lightning suite, a collection of high-performance state-vector simulators targeting CPU, GPU, and HPC-native architectures and workloads. Quantum applications such as QAOA, VQE, and synthetic workloads are implemented to demonstrate the supported classical computing architectures and showcase the scale of problems that can be simulated using our tooling. We benchmark the performance of […]
Mar, 10
SYCL-Bench 2020: Benchmarking SYCL 2020 on AMD, Intel, and NVIDIA GPUs
Today, the SYCL standard represents the most advanced programming model for heterogeneous computing, delivering both productivity, portability, and performance in pure C++17. SYCL 2020, in particular, represents a major enhancement that pushes the boundaries of heterogeneous programming by introducing a number of new features. As the new features are implemented by existing compilers, it becomes […]
Mar, 10
Parallel Implementation of Lightweight Secure Hash Algorithm on CPU and GPU Environments
Currently, cryptographic hash functions are widely used in various applications, including message authentication codes, cryptographic random generators, digital signatures, key derivation functions, and post-quantum algorithms. Notably, they play a vital role in establishing secure communication between servers and clients. Specifically, servers often need to compute a large number of hash functions simultaneously to provide smooth […]
Mar, 3
Using AI libraries for Incompressible Computational Fluid Dynamics
Recently, there has been a huge effort focused on developing highly efficient open source libraries to perform Artificial Intelligence (AI) related computations on different computer architectures (for example, CPUs, GPUs and new AI processors). This has not only made the algorithms based on these libraries highly efficient and portable between different architectures, but also has […]
Mar, 3
Sustainable Supercomputing for AI: GPU Power Capping at HPC Scale
As research and deployment of AI grows, the computational burden to support and sustain its progress inevitably does too. To train or fine-tune state-of-the-art models in NLP, computer vision, etc., some form of AI hardware acceleration is virtually a requirement. Recent large language models require considerable resources to train and deploy, resulting in significant energy […]
Mar, 3
Spyx: A Library for Just-In-Time Compiled Optimization of Spiking Neural Networks
As the role of artificial intelligence becomes increasingly pivotal in modern society, the efficient training and deployment of deep neural networks have emerged as critical areas of focus. Recent advancements in attention-based large neural architectures have spurred the development of AI accelerators, facilitating the training of extensive, multi-billion parameter models. Despite their effectiveness, these powerful […]