30292

Posts

Oct, 12

Accelerating cosmological simulations on GPUs: a portable approach using OpenMP

In this work we present the porting to Graphics Processing Units (GPUs, using OpenMP target directives) and optimization of a key module within the cosmological {pinocchio} code, a Lagrangian Perturbation Theory (LPT)-based framework widely used for generating dark matter (DM) halo catalogs. Our optimization focuses on a specific segment of the code responsible for calculating […]
Oct, 12

ConCuR: Conciseness Makes State-of-the-Art Kernel Generation

GPU kernel generation by LLMs has recently experienced rapid development, leveraging test-time scaling and reinforcement learning techniques. However, a key challenge for kernel generation is the scarcity of high-quality data, as most high-quality kernels are proprietary and not open-source. This challenge prevents us from leveraging supervised fine-tuning to align LLMs to the kernel generation task. […]
Oct, 12

Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR

MLIR (Multi-Level Intermediate Representation) has rapidly become a foundational technology for modern compiler frameworks, enabling extensibility across diverse domains. However, ensuring the correctness and robustness of MLIR itself remains challenging. Existing fuzzing approaches-based on manually crafted templates or rule-based mutations-struggle to generate sufficiently diverse and semantically valid test cases, making it difficult to expose subtle […]
Oct, 12

High-Performance Computing: from Optimization to Automation

The digital revolution of our society is driven by major technological advancements, enabled not only by the growing capabilities of computers but also by the evolution of their uses. These developments result from a complex interaction between what we can do, what we know how to do, and what we want to do, all within […]
Oct, 12

EvoEngineer: Mastering Automated CUDA Kernel Code Evolution with Large Language Models

CUDA kernel optimization has become a critical bottleneck for AI performance, as deep learning training and inference efficiency directly depends on highly optimized GPU kernels. Despite the promise of Large Language Models (LLMs) for automating kernel optimization, this field suffers from a fragmented ecosystem of isolated and incomparable approaches with unclear problem formulations. Furthermore, general-purpose […]
Oct, 5

VibeCodeHPC: An Agent-Based Iterative Prompting Auto-Tuner for HPC Code Generation Using LLMs

We propose VibeCodeHPC, an automatic tuning system for HPC programs based on multi-agent LLMs for code generation. VibeCodeHPC tunes programs through multi-agent role allocation and iterative prompt refinement. We describe the system configuration with four roles: Project Manager (PM), System Engineer (SE), Programmer (PG), and Continuous Delivery (CD). We introduce dynamic agent deployment and activity […]
Oct, 5

Compile-Time Resource Safety for GPU APIs: A Low-Overhead Typestate Framework

GPU APIs such as OpenCL require correct host-side sequencing of buffer and queue operations; errors in state transitions and synchronization typically only become visible at runtime in C bindings. We present TypeSec, a lightweight Rust typestate framework that encodes buffer and event protocols in the type system and thereby excludes invalid states already at compile […]
Oct, 5

exa-AMD: An Exascale-Ready Framework for Accelerating the Discovery and Design of Functional Materials

Exascale computing is transforming the field of materials science by enabling simulations of unprecedented scale and complexity. We present exa-AMD, an open-source, high-performance simulation code specifically designed for accelerated materials discovery on modern supercomputers. exa-AMD addresses the computational challenges inherent in large-scale materials discovery by employing task-based parallelization strategies and optimized data management tailored for […]
Oct, 5

Performance and Numerical Aspects of Decompositional Factorizations with FP64 Floating-Point Emulation in INT8

Mixing precisions for performance has been an ongoing trend as the modern hardware accelerators started including new, and mostly lower-precision, data formats. The advantage of using them is a great potential of performance gain and energy savings. The disadvantage are the numerical issues not present in the standard-mandated floating-point formats. Split integer emulation of FP64 […]
Oct, 5

Opal: A Modular Framework for Optimizing Performance using Analytics and LLMs

Large Language Models (LLMs) show promise for automated code optimization but struggle without performance context. This work introduces Opal, a modular framework that connects performance analytics insights with the vast body of published by guiding LLMs to generate informed, trustworthy optimizations. Unlike traditional performance tools that identify bottlenecks but stop short of actionable suggestions, Opal […]
Sep, 28

TRUST: the HPC open-source CFD platform – from CPU to GPU

Since 1993, the CEA has developed TRUST, an open-source CFD software platform designed to address a wide range of thermohydraulic problems. Initially focused on nuclear applications, the platform has progressively evolved to support incompressible single-phase flows, low-Mach-number reactive flows, and fully compressible multi-phase flows. TRUST incorporates a variety of numerical schemes and supports multiple mesh […]
Sep, 28

Towards GPU Parallelism Abstractions in Rust: A Case Study with Linear Pipelines

Programming Graphics Processing Units (GPUs) for general-purpose computation remains a daunting task, often requiring specialized knowledge of low-level APIs like CUDA or OpenCL. While Rust has emerged as a modern, safe, and performant systems programming language, its adoption in the GPU computing domain is still nascent. Existing approaches often involve intricate compiler modifications or complex […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: