29421

Posts

Sep, 22

Challenging Portability Paradigms: FPGA Acceleration Using SYCL and OpenCL

As the interest in FPGA-based accelerators for HPC applications increases, new challenges also arise, especially concerning different programming and portability issues. This paper aims to provide a snapshot of the current state of the FPGA tooling and its problems. To do so, we evaluate the performance portability of two frameworks for developing FPGA solutions for […]
Sep, 22

Collection skeletons: declarative abstractions for data collections

Modern programming languages provide programmers with rich abstractions for data collections as part of their standard libraries, e.g., Containers in the C++ STL, the Java Collections Framework, or the Scala Collections API. Typically, these collections frameworks are organised as hierarchies that provide programmers with common abstract data types (ADTs) like lists, queues, and stacks. While […]
Sep, 22

RenderKernel: High-level programming for real-time rendering systems

Real-time rendering applications leverage heterogeneous computing to optimize performance. However, software development across multiple devices presents challenges, including data layout inconsistencies, synchronization issues, resource management complexities, and architectural disparities. Additionally, the creation of such systems requires verbose and unsafe programming models. Recent developments in domain-specific and unified shading languages aim to mitigate these issues. Yet, […]
Sep, 22

A Study of Performance Programming of CPU, GPU accelerated Computers and SIMD Architecture

Parallel computing is a standard approach to achieving high-performance computing (HPC). Three commonly used methods to implement parallel computing include: 1) applying multithreading technology on single-core or multi-core CPUs; 2) incorporating powerful parallel computing devices such as GPUs, FPGAs, and other accelerators; and 3) utilizing special parallel architectures like Single Instruction/Multiple Data (SIMD). Many researchers […]
Sep, 15

DNA sequence alignment: An assignment for OpenMP, MPI, and CUDA/OpenCL

We present an assignment for a full Parallel Computing course. Since 2017/2018, we have proposed a different problem each academic year to illustrate various methodologies for approaching the same computational problem using different parallel programming models. They are designed to be parallelized using shared-memory programming with OpenMP, distributed-memory programming with MPI, and GPU programming with […]
Sep, 15

Optimal Workload Placement on Multi-Instance GPUs

There is an urgent and pressing need to optimize usage of Graphical Processing Units (GPUs), which have arguably become one of the most expensive and sought after IT resources. To help with this goal, several of the current generation of GPUs support a partitioning feature, called Multi-Instance GPU (MIG) to allow multiple workloads to share […]
Sep, 15

Enhancing Code Portability, Problem Scale, and Storage Efficiency in Exascale Applicationsin Exascale Applications

The growing diversity of hardware and software stacks adds additional development challenges to high-performance software as we move to exascale systems. Re- engineering software for each new platform is no longer practical due to increasing heterogeneity. Hardware designers are prioritizing AI/ML features like reduced precision that increase performance but sacrifice accuracy. The growing scale of […]
Sep, 15

Refining HPCToolkit for application performance analysis at exascale

As part of the US Department of Energy’s Exascale Computing Project (ECP), Rice University has been refining its HPCToolkit performance tools to better support measurement and analysis of applications executing on exascale supercomputers. To efficiently collect performance measurements of GPU-accelerated applications, HPCToolkit employs novel non-blocking data structures to communicate performance measurements between tool threads and […]
Sep, 15

Optimizing the Weather Research and Forecasting Model with OpenMP Offload and Codee

Currently, the Weather Research and Forecasting model (WRF) utilizes shared memory (OpenMP) and distributed memory (MPI) parallelisms. To take advantage of GPU resources on the Perlmutter supercomputer at NERSC, we port parts of the computationally expensive routines of the Fast Spectral Bin Microphysics (FSBM) microphysical scheme to NVIDIA GPUs using OpenMP device offloading directives. To […]
Sep, 1

Owl: Differential-based Side-Channel Leakage Detection for CUDA Applications

Over the past decade, various methods for detecting side-channel leakage have been proposed and proven to be effective against CPU side-channel attacks. These methods are valuable in assisting developers to identify and patch side-channel vulnerabilities. Nevertheless, recent research has revealed the feasibility of exploiting side-channel vulnerabilities to steal sensitive information from GPU applications, which are […]
Sep, 1

VitBit: Enhancing Embedded GPU Performance for AI Workloads through Register Operand Packing

The rapid advancement of Artificial Intelligence (AI) necessitates significant enhancements in the energy efficiency of Graphics Processing Units (GPUs) for Deep Neural Network (DNN) workloads. Such a challenge is particularly critical for embedded GPUs, which operate within stringent power constraints. Traditional GPU architectures, designed to support a limited set of numeric formats, face challenges in […]
Sep, 1

Exploring Scalability in C++ Parallel STL Implementations

Since the advent of parallel algorithms in the C++17 Standard Template Library (STL), the STL has become a viable framework for creating performance-portable applications. Given multiple existing implementations of the parallel algorithms, a systematic, quantitative performance comparison is essential for choosing the appropriate implementation for a particular hardware configuration. In this work, we introduce a […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org