Posts
Sep, 29
miniLB: A Performance Portability Study of Lattice-Boltzmann Simulations
The Lattice Boltzmann Method (LBM) is a computational technique of Computational Fluid Dynamics (CFD) that has gained popularity due to its high parallelism and ability to handle complex geometries with minimal effort. Although LBM frameworks are increasingly important in various industries and research fields, their complexity makes them difficult to modify and can lead to […]
Sep, 29
Bitstream Database-Driven FPGA Programming Flow Based on Standard OpenCL
Field-programmable gate array (FPGA) vendors provide high-level synthesis (HLS) compilers with accompanying OpenCL runtimes to enable easier use of their devices by non-hardware experts. However, the current runtimes provided by the vendors are not OpenCL-compliant, limiting the application portability and making it difficult to integrate FPGA devices in heterogeneous computing platforms. We propose an automated […]
Sep, 29
Automatic Generation of OpenCL Code through Polyhedral Compilation with LLM
In recent years, a multitude of AI solutions has emerged to facilitate code generation, commonly known as Language Model-based Programming (LLM). These tools empower programmers to automate their work. Automatic programming also falls within the domain of optimizing compilers, primarily based on the polyhedral model, which processes loop nests concentrating most computations. This article focuses […]
Sep, 29
OpenACC offloading of the MFC compressible multiphase flow solver on AMD and NVIDIA GPUs
GPUs are the heart of the latest generations of supercomputers. We efficiently accelerate a compressible multiphase flow solver via OpenACC on NVIDIA and AMD Instinct GPUs. Optimization is accomplished by specifying the directive clauses ‘gang vector’ and ‘collapse’. Further speedups of six and ten times are achieved by packing user-defined types into coalesced multidimensional arrays […]
Sep, 22
The Landscape of GPU-Centric Communication
In recent years, GPUs have become the preferred accelerators for HPC and ML applications due to their parallelism and fast memory bandwidth. While GPUs boost computation, inter-GPU communication can create scalability bottlenecks, especially as the number of GPUs per node and cluster grows. Traditionally, the CPU managed multi-GPU communication, but advancements in GPU-centric communication now […]
Sep, 22
Challenging Portability Paradigms: FPGA Acceleration Using SYCL and OpenCL
As the interest in FPGA-based accelerators for HPC applications increases, new challenges also arise, especially concerning different programming and portability issues. This paper aims to provide a snapshot of the current state of the FPGA tooling and its problems. To do so, we evaluate the performance portability of two frameworks for developing FPGA solutions for […]
Sep, 22
Collection skeletons: declarative abstractions for data collections
Modern programming languages provide programmers with rich abstractions for data collections as part of their standard libraries, e.g., Containers in the C++ STL, the Java Collections Framework, or the Scala Collections API. Typically, these collections frameworks are organised as hierarchies that provide programmers with common abstract data types (ADTs) like lists, queues, and stacks. While […]
Sep, 22
RenderKernel: High-level programming for real-time rendering systems
Real-time rendering applications leverage heterogeneous computing to optimize performance. However, software development across multiple devices presents challenges, including data layout inconsistencies, synchronization issues, resource management complexities, and architectural disparities. Additionally, the creation of such systems requires verbose and unsafe programming models. Recent developments in domain-specific and unified shading languages aim to mitigate these issues. Yet, […]
Sep, 22
A Study of Performance Programming of CPU, GPU accelerated Computers and SIMD Architecture
Parallel computing is a standard approach to achieving high-performance computing (HPC). Three commonly used methods to implement parallel computing include: 1) applying multithreading technology on single-core or multi-core CPUs; 2) incorporating powerful parallel computing devices such as GPUs, FPGAs, and other accelerators; and 3) utilizing special parallel architectures like Single Instruction/Multiple Data (SIMD). Many researchers […]
Sep, 15
DNA sequence alignment: An assignment for OpenMP, MPI, and CUDA/OpenCL
We present an assignment for a full Parallel Computing course. Since 2017/2018, we have proposed a different problem each academic year to illustrate various methodologies for approaching the same computational problem using different parallel programming models. They are designed to be parallelized using shared-memory programming with OpenMP, distributed-memory programming with MPI, and GPU programming with […]
Sep, 15
Optimal Workload Placement on Multi-Instance GPUs
There is an urgent and pressing need to optimize usage of Graphical Processing Units (GPUs), which have arguably become one of the most expensive and sought after IT resources. To help with this goal, several of the current generation of GPUs support a partitioning feature, called Multi-Instance GPU (MIG) to allow multiple workloads to share […]
Sep, 15
Enhancing Code Portability, Problem Scale, and Storage Efficiency in Exascale Applicationsin Exascale Applications
The growing diversity of hardware and software stacks adds additional development challenges to high-performance software as we move to exascale systems. Re- engineering software for each new platform is no longer practical due to increasing heterogeneity. Hardware designers are prioritizing AI/ML features like reduced precision that increase performance but sacrifice accuracy. The growing scale of […]