high performance computing on graphics processing units: hgpu.org

Posts

Dec, 15

RTCUDB: Building Databases with RT Processors

A spectrum of new hardware has been studied to accelerate database systems in the past decade. Specifically, CUDA cores are known to benefit from the fast development of GPUs and make notable performance improvements. The state-of-the-art GPU-based implementation, i.e., Crystal, can achieve up to 61 times higher performance than CPU-based implementations. However, experiments show that […]

CUDA

Dec, 15

Leveraging the potential of task-based programming with OpenMP task graphs

The task execution model is widely used in computer engineering, it helps developers to design, develop and understand software systems. OpenMP is the de-facto programming model to parallelize sequential algorithms on shared-memory machines. Coupled with the task parallelization, OpenMP is able to conveniently parallelize structured and non-structured applications, it also allows users to offload work […]

CUDA

Dec, 15

Deep Learning Model Security: Threats and Defenses

Deep learning has transformed AI applications but faces critical security challenges, including adversarial attacks, data poisoning, model theft, and privacy leakage. This survey examines these vulnerabilities, detailing their mechanisms and impact on model integrity and confidentiality. Practical implementations, including adversarial examples, label flipping, and backdoor attacks, are explored alongside defenses such as adversarial training, differential […]

Dec, 15

Analyzing the Performance Portability of SYCL across CPUs, GPUs, and Hybrid Systems with Protein Database Search

The high-performance computing (HPC) landscape is undergoing rapid transformation, with an increasing emphasis on energy-efficient and heterogeneous computing environments. This comprehensive study extends our previous research on SYCL’s performance portability by evaluating its effectiveness across a broader spectrum of computing architectures, including CPUs, GPUs, and hybrid CPU-GPU configurations from NVIDIA, Intel, and AMD. Our analysis […]

CUDA

Dec, 8

LLOR: Automated Repair of OpenMP Programs

In this paper, we present a technique for repairing data race errors in parallel programs written in C/C++ and Fortran using the OpenMP API. Our technique can also remove barriers that are deemed unnecessary for correctness. We implement these ideas in our tool called LLOR, which takes a language-independent approach to provide appropriate placements of […]

Dec, 8

Automating Energy-Efficient GPU Kernel Generation: A Fast Search-Based Compilation Approach

Deep Neural Networks (DNNs) have revolutionized various fields, but their deployment on GPUs often leads to significant energy consumption. Unlike existing methods for reducing GPU energy consumption, which are either hardware-inflexible or limited by workload constraints, this paper addresses the problem at the GPU kernel level. We propose a novel search-based compilation method to generate […]

CUDA

Dec, 8

Unified schemes for directive-based GPU offloading

GPU is the dominant accelerator device due to its high performance and energy efficiency. Directive-based GPU offloading using OpenACC or OpenMP target is a convenient way to port existing codes originally developed for multicore CPUs. Although OpenACC and OpenMP target provide similar features, both methods have pros and cons. OpenACC has better functions and an […]

Dec, 8

FortranX: Harnessing Code Generation, Portability, and Heterogeneity in Fortran

Due to its historical popularity, Fortran was used to implement many important scientific applications. The complexity of these applications along with the transition to modern high performance languages like C++ has made modernization and optimization challenging for these applications. Significant development time is incurred to understand and optimize key algorithms as well as leverage new […]

CUDA

•

OpenCL

Dec, 8

Guardian: Safe GPU Sharing in Multi-Tenant Environments

Modern GPU applications, such as machine learning (ML), can only partially utilize GPUs, leading to GPU underutilization in cloud environments. Sharing GPUs across multiple applications from different tenants can improve resource utilization and consequently cost, energy, and power efficiency. However, GPU sharing creates memory safety concerns because kernels must share a single GPU address space. […]

CUDA

Dec, 1

CLUEstering: a high-performance density-based clustering library for scientific computing

Clustering is a computational technique that aims at classifying objects based on their similarity, and is widely used in many branches of science nowadays, for instance in image segmentation, medical imaging, study of complex systems, machine learning techniques and high-energy physics. As the amount of data collected in every field of research increases, techniques like […]

CUDA

Dec, 1

PyOMP: Parallel programming for CPUs and GPUs with OpenMP and Python

Python is the most popular programming language. OpenMP is the most popular parallel programming API. Projecting OpenMP into Python will help expand the HPC community. We call our Python-based OpenMP system PyOMP. In this short paper we describe PyOMP and its use for parallel programming for CPUs and GPUs. We describe its implementation through the […]

Dec, 1

Hardware Accelerators for Artificial Intelligence

In this chapter, we aim to explore an in-depth exploration of the specialized hardware accelerators designed to enhance Artificial Intelligence (AI) applications, focusing on their necessity, development, and impact on the field of AI. It covers the transition from traditional computing systems to advanced AI-specific hardware, addressing the growing demands of AI algorithms and the […]

* * *

high performance computing on graphics processing units: hgpu.org

Posts

RTCUDB: Building Databases with RT Processors

Leveraging the potential of task-based programming with OpenMP task graphs

Deep Learning Model Security: Threats and Defenses

Analyzing the Performance Portability of SYCL across CPUs, GPUs, and Hybrid Systems with Protein Database Search

LLOR: Automated Repair of OpenMP Programs

Automating Energy-Efficient GPU Kernel Generation: A Fast Search-Based Compilation Approach

Unified schemes for directive-based GPU offloading

FortranX: Harnessing Code Generation, Portability, and Heterogeneity in Fortran

Guardian: Safe GPU Sharing in Multi-Tenant Environments

CLUEstering: a high-performance density-based clustering library for scientific computing

PyOMP: Parallel programming for CPUs and GPUs with OpenMP and Python

Hardware Accelerators for Artificial Intelligence

Recent source codes

Kernel Library for LLM Serving

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Genten: Software for Generalized Tensor Decompositions by Sandia National Laboratories

Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR

Pinocchio: PINpointing Orbit Crossing Collapsed Hierarchical Objects

KernelCoder: trained on a curated dataset of reasoning traces and CUDA kernel pairs

VibeCodeHPC - Multi Agentic Vibe Coding for HPC

Compile-Time Resource Safety for GPU APIs: A Low-Overhead Typestate Framework

exa-AMD: Exascale Accelerated Materials Discovery

Most viewed papers (last 30 days)