Posts
Dec, 15
Leveraging the potential of task-based programming with OpenMP task graphs
The task execution model is widely used in computer engineering, it helps developers to design, develop and understand software systems. OpenMP is the de-facto programming model to parallelize sequential algorithms on shared-memory machines. Coupled with the task parallelization, OpenMP is able to conveniently parallelize structured and non-structured applications, it also allows users to offload work […]
Dec, 15
Deep Learning Model Security: Threats and Defenses
Deep learning has transformed AI applications but faces critical security challenges, including adversarial attacks, data poisoning, model theft, and privacy leakage. This survey examines these vulnerabilities, detailing their mechanisms and impact on model integrity and confidentiality. Practical implementations, including adversarial examples, label flipping, and backdoor attacks, are explored alongside defenses such as adversarial training, differential […]
Dec, 8
LLOR: Automated Repair of OpenMP Programs
In this paper, we present a technique for repairing data race errors in parallel programs written in C/C++ and Fortran using the OpenMP API. Our technique can also remove barriers that are deemed unnecessary for correctness. We implement these ideas in our tool called LLOR, which takes a language-independent approach to provide appropriate placements of […]
Dec, 8
Guardian: Safe GPU Sharing in Multi-Tenant Environments
Modern GPU applications, such as machine learning (ML), can only partially utilize GPUs, leading to GPU underutilization in cloud environments. Sharing GPUs across multiple applications from different tenants can improve resource utilization and consequently cost, energy, and power efficiency. However, GPU sharing creates memory safety concerns because kernels must share a single GPU address space. […]
Dec, 8
Automating Energy-Efficient GPU Kernel Generation: A Fast Search-Based Compilation Approach
Deep Neural Networks (DNNs) have revolutionized various fields, but their deployment on GPUs often leads to significant energy consumption. Unlike existing methods for reducing GPU energy consumption, which are either hardware-inflexible or limited by workload constraints, this paper addresses the problem at the GPU kernel level. We propose a novel search-based compilation method to generate […]
Dec, 8
Unified schemes for directive-based GPU offloading
GPU is the dominant accelerator device due to its high performance and energy efficiency. Directive-based GPU offloading using OpenACC or OpenMP target is a convenient way to port existing codes originally developed for multicore CPUs. Although OpenACC and OpenMP target provide similar features, both methods have pros and cons. OpenACC has better functions and an […]
Dec, 8
FortranX: Harnessing Code Generation, Portability, and Heterogeneity in Fortran
Due to its historical popularity, Fortran was used to implement many important scientific applications. The complexity of these applications along with the transition to modern high performance languages like C++ has made modernization and optimization challenging for these applications. Significant development time is incurred to understand and optimize key algorithms as well as leverage new […]
Dec, 1
CLUEstering: a high-performance density-based clustering library for scientific computing
Clustering is a computational technique that aims at classifying objects based on their similarity, and is widely used in many branches of science nowadays, for instance in image segmentation, medical imaging, study of complex systems, machine learning techniques and high-energy physics. As the amount of data collected in every field of research increases, techniques like […]
Dec, 1
PyOMP: Parallel programming for CPUs and GPUs with OpenMP and Python
Python is the most popular programming language. OpenMP is the most popular parallel programming API. Projecting OpenMP into Python will help expand the HPC community. We call our Python-based OpenMP system PyOMP. In this short paper we describe PyOMP and its use for parallel programming for CPUs and GPUs. We describe its implementation through the […]
Dec, 1
Hardware Accelerators for Artificial Intelligence
In this chapter, we aim to explore an in-depth exploration of the specialized hardware accelerators designed to enhance Artificial Intelligence (AI) applications, focusing on their necessity, development, and impact on the field of AI. It covers the transition from traditional computing systems to advanced AI-specific hardware, addressing the growing demands of AI algorithms and the […]
Dec, 1
Scaling SU(2) to 1000 GPUs using HiRep
HiRep allows flexible simulations of higher representations of Wilson Fermions with various actions and gauge groups and a range of inverters and integrators. This is particularly important for enabling evaluations of observables relevant to phenomenological inputs for Beyond-the-Standard-Model physics from lattice field theory. We present progress on the GPU porting of available features, especially in […]
Dec, 1
Understanding GEMM Performance and Energy on NVIDIA Ada Lovelace: A Machine Learning-Based Analytical Approach
Analytical framework for predicting General Matrix Multiplication (GEMM) performance on modern GPUs, focusing on runtime, power consumption, and energy efficiency. Our study employs two approaches: a custom-implemented tiled matrix multiplication kernel for fundamental analysis, and NVIDIA’s CUTLASS library for comprehensive performance data collection across advanced configurations. Using the NVIDIA RTX 4070 as our experimental platform, […]