HPC++: An LLVM-Based Automatic Parallelization Framework with Heterogeneous CPU–GPU Execution
@article{martirosyan2026hpc++,
title={HPC++: An LLVM-Based Automatic Parallelization Framework with Heterogeneous CPU–GPU Execution},
author={Martirosyan, Vahram},
year={2026}
}
We present HPC++, an automatic parallelization framework that transforms sequential C++ programs into efficient parallel implementations targeting both multi-core CPUs and OpenCL-capable GPUs. Operating at the LLVM Intermediate Representation (IR) level, HPC++ performs pattern-driven analysis to detect seven distinct parallelization strategies—including reductions, elementwise maps, matrix multiplications, nested loops, search operations, histogram patterns, and independent function calls—and emits optimized parallel wrappers with zero source-code modifications. On an Intel Core Ultra 7 255H (16 cores) with an integrated Intel Graphics GPU (128 CUs) employing a Unified Memory Architecture (UMA), the framework achieves peak speedups of 2009.4× on GPU-offloaded workloads and 32.1× on CPU-parallelized tasks, while maintaining numerical correctness across all 134 unit tests and 18 integration tests. We describe the system architecture, the IR-level analysis and transformation pipeline, the dual-target CPU/GPU code generation strategy, and present comprehensive benchmark results across scientific computing workloads.
February 23, 2026 by hgpu
Your response
You must be logged in to post a comment.




