29600

Posts

Dec, 15

Analyzing the Performance Portability of SYCL across CPUs, GPUs, and Hybrid Systems with Protein Database Search

The high-performance computing (HPC) landscape is undergoing rapid transformation, with an increasing emphasis on energy-efficient and heterogeneous computing environments. This comprehensive study extends our previous research on SYCL’s performance portability by evaluating its effectiveness across a broader spectrum of computing architectures, including CPUs, GPUs, and hybrid CPU-GPU configurations from NVIDIA, Intel, and AMD. Our analysis […]
Dec, 15

Towards Performance-Aware Allocation for Accelerated Machine Learning on GPU-SSD Systems

The exponential growth of data-intensive machine learning workloads has exposed significant limitations in conventional GPU-accelerated systems, especially when processing datasets exceeding GPU DRAM capacity. We propose MQMS, an augmented in-storage GPU architecture and simulator that is aware of internal SSD states and operations, enabling intelligent scheduling and address allocation to overcome performance bottlenecks caused by […]
Dec, 15

RTCUDB: Building Databases with RT Processors

A spectrum of new hardware has been studied to accelerate database systems in the past decade. Specifically, CUDA cores are known to benefit from the fast development of GPUs and make notable performance improvements. The state-of-the-art GPU-based implementation, i.e., Crystal, can achieve up to 61 times higher performance than CPU-based implementations. However, experiments show that […]
Dec, 15

Leveraging the potential of task-based programming with OpenMP task graphs

The task execution model is widely used in computer engineering, it helps developers to design, develop and understand software systems. OpenMP is the de-facto programming model to parallelize sequential algorithms on shared-memory machines. Coupled with the task parallelization, OpenMP is able to conveniently parallelize structured and non-structured applications, it also allows users to offload work […]
Dec, 15

Deep Learning Model Security: Threats and Defenses

Deep learning has transformed AI applications but faces critical security challenges, including adversarial attacks, data poisoning, model theft, and privacy leakage. This survey examines these vulnerabilities, detailing their mechanisms and impact on model integrity and confidentiality. Practical implementations, including adversarial examples, label flipping, and backdoor attacks, are explored alongside defenses such as adversarial training, differential […]
Dec, 8

LLOR: Automated Repair of OpenMP Programs

In this paper, we present a technique for repairing data race errors in parallel programs written in C/C++ and Fortran using the OpenMP API. Our technique can also remove barriers that are deemed unnecessary for correctness. We implement these ideas in our tool called LLOR, which takes a language-independent approach to provide appropriate placements of […]
Dec, 8

Automating Energy-Efficient GPU Kernel Generation: A Fast Search-Based Compilation Approach

Deep Neural Networks (DNNs) have revolutionized various fields, but their deployment on GPUs often leads to significant energy consumption. Unlike existing methods for reducing GPU energy consumption, which are either hardware-inflexible or limited by workload constraints, this paper addresses the problem at the GPU kernel level. We propose a novel search-based compilation method to generate […]
Dec, 8

Unified schemes for directive-based GPU offloading

GPU is the dominant accelerator device due to its high performance and energy efficiency. Directive-based GPU offloading using OpenACC or OpenMP target is a convenient way to port existing codes originally developed for multicore CPUs. Although OpenACC and OpenMP target provide similar features, both methods have pros and cons. OpenACC has better functions and an […]
Dec, 8

FortranX: Harnessing Code Generation, Portability, and Heterogeneity in Fortran

Due to its historical popularity, Fortran was used to implement many important scientific applications. The complexity of these applications along with the transition to modern high performance languages like C++ has made modernization and optimization challenging for these applications. Significant development time is incurred to understand and optimize key algorithms as well as leverage new […]
Dec, 8

Guardian: Safe GPU Sharing in Multi-Tenant Environments

Modern GPU applications, such as machine learning (ML), can only partially utilize GPUs, leading to GPU underutilization in cloud environments. Sharing GPUs across multiple applications from different tenants can improve resource utilization and consequently cost, energy, and power efficiency. However, GPU sharing creates memory safety concerns because kernels must share a single GPU address space. […]
Dec, 1

CLUEstering: a high-performance density-based clustering library for scientific computing

Clustering is a computational technique that aims at classifying objects based on their similarity, and is widely used in many branches of science nowadays, for instance in image segmentation, medical imaging, study of complex systems, machine learning techniques and high-energy physics. As the amount of data collected in every field of research increases, techniques like […]
Dec, 1

PyOMP: Parallel programming for CPUs and GPUs with OpenMP and Python

Python is the most popular programming language. OpenMP is the most popular parallel programming API. Projecting OpenMP into Python will help expand the HPC community. We call our Python-based OpenMP system PyOMP. In this short paper we describe PyOMP and its use for parallel programming for CPUs and GPUs. We describe its implementation through the […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: