Posts
Dec, 29
TorchQC – A framework for efficiently integrating machine and deep learning methods in quantum dynamics and control
Machine learning has been revolutionizing our world over the last few years and is also increasingly exploited in several areas of physics, including quantum dynamics need for a framework that brings together machine learning models and quantum simulation methods has been quite high within the quantum control field, with the ultimate goal of exploiting these […]
Dec, 29
A survey on FPGA-based accelerator for ML models
This paper thoroughly surveys machine learning (ML) algorithms acceleration in hardware accelerators, focusing on Field-Programmable Gate Arrays (FPGAs). It reviews 287 out of 1138 papers from the past six years, sourced from four top FPGA conferences. Such selection underscores the increasing integration of ML and FPGA technologies and their mutual importance in technological advancement. Research […]
Dec, 29
Development of a new framework for high performance volunteer computing
The majority of Volunteer Computing (VC) projects are based on the Berkeley Open Infrastructure for Network Computing (BOINC) framework. BOINC is an opensource middleware system designed to support a variety of volunteer computing projects across multiple scientific disciplines, including molecular biology, mathematics, cryptography, linguistics, and astrophysics. However, it is worth noting that BOINC primarily supports […]
Dec, 29
Asynchronous-Many-Task Systems: Challenges and Opportunities – Scaling an AMR Astrophysics Code on Exascale machines using Kokkos and HPX
Dynamic and adaptive mesh refinement is pivotal in high-resolution, multi-physics, multi-model simulations, necessitating precise physics resolution in localized areas across expansive domains. Today’s supercomputers’ extreme heterogeneity presents a significant challenge for dynamically adaptive codes, highlighting the importance of achieving performance portability at scale. Our research focuses on astrophysical simulations, particularly stellar mergers, to elucidate early […]
Dec, 24
Utilizing Tensor Cores in Futhark
Modern hardware has become more heterogeneous, and with the AI boom, specialized hardware for especially performing matrix multiplication has become readily available. In NVIDIA graphical processing units (GPUs), Tensor Cores allow for efficient execution of matrix multiplication routines that can significantly speed up AI and deep learning operations, as well as other programs containing matrix […]
Dec, 24
CPPJoules: An Energy Measurement Tool for C++
With the increasing complexity of modern software and the demand for high performance, energy consumption has become a critical factor for developers and researchers. While much of the research community is focused on evaluating the energy consumption of machine learning and artificial intelligence systems — often implemented in Python — there is a gap when […]
Dec, 24
Reproducible Study and Performance Analysis of GPU Programming Paradigms: OpenACC vs. CUDA in Key Linear Algebra Computations
Scientific and engineering problems are frequently governed by partial differential equations; however, the analytical solutions of these equations are often impractical, thereby forcing the adoption of numerical methods. Basic Linear Algebra Subprograms (BLAS) operations constitute a fundamental component of these numerical approaches, incorporating essential tasks such as Level 1 operations (dot products and vector addition), […]
Dec, 24
Accelerating Sparse Graph Neural Networks with Tensor Core Optimization
Graph neural networks (GNNs) have seen extensive application in domains such as social networks, bioinformatics, and recommendation systems. However, the irregularity and sparsity of graph data challenge traditional computing methods, which are insufficient to meet the performance demands of GNNs. Recent research has explored parallel acceleration using CUDA Cores and Tensor Cores, but significant challenges […]
Dec, 24
HPC-Coder-V2: Studying Code LLMs Across Low-Resource Parallel Languages
Large Language Model (LLM) based coding tools have been tremendously successful as software development assistants, yet they are often designed for general purpose programming tasks and perform poorly for more specialized domains such as high performance computing. Creating specialized models and tools for these domains is crucial towards gaining the benefits of LLMs in areas […]
Dec, 15
Analyzing the Performance Portability of SYCL across CPUs, GPUs, and Hybrid Systems with Protein Database Search
The high-performance computing (HPC) landscape is undergoing rapid transformation, with an increasing emphasis on energy-efficient and heterogeneous computing environments. This comprehensive study extends our previous research on SYCL’s performance portability by evaluating its effectiveness across a broader spectrum of computing architectures, including CPUs, GPUs, and hybrid CPU-GPU configurations from NVIDIA, Intel, and AMD. Our analysis […]
Dec, 15
Towards Performance-Aware Allocation for Accelerated Machine Learning on GPU-SSD Systems
The exponential growth of data-intensive machine learning workloads has exposed significant limitations in conventional GPU-accelerated systems, especially when processing datasets exceeding GPU DRAM capacity. We propose MQMS, an augmented in-storage GPU architecture and simulator that is aware of internal SSD states and operations, enabling intelligent scheduling and address allocation to overcome performance bottlenecks caused by […]
Dec, 15
RTCUDB: Building Databases with RT Processors
A spectrum of new hardware has been studied to accelerate database systems in the past decade. Specifically, CUDA cores are known to benefit from the fast development of GPUs and make notable performance improvements. The state-of-the-art GPU-based implementation, i.e., Crystal, can achieve up to 61 times higher performance than CPU-based implementations. However, experiments show that […]