30590

HPC++: An LLVM-Based Automatic Parallelization Framework with Heterogeneous CPU–GPU Execution

Vahram Martirosyan

@article{martirosyan2026hpc++,

   title={HPC++: An LLVM-Based Automatic Parallelization Framework with Heterogeneous CPU–GPU Execution},

   author={Martirosyan, Vahram},

   year={2026}

}

Download Download (PDF)   View View   Source Source   

256

views

We present HPC++, an automatic parallelization framework that transforms sequential C++ programs into efficient parallel implementations targeting both multi-core CPUs and OpenCL-capable GPUs. Operating at the LLVM Intermediate Representation (IR) level, HPC++ performs pattern-driven analysis to detect seven distinct parallelization strategies—including reductions, elementwise maps, matrix multiplications, nested loops, search operations, histogram patterns, and independent function calls—and emits optimized parallel wrappers with zero source-code modifications. On an Intel Core Ultra 7 255H (16 cores) with an integrated Intel Graphics GPU (128 CUs) employing a Unified Memory Architecture (UMA), the framework achieves peak speedups of 2009.4× on GPU-offloaded workloads and 32.1× on CPU-parallelized tasks, while maintaining numerical correctness across all 134 unit tests and 18 integration tests. We describe the system architecture, the IR-level analysis and transformation pipeline, the dual-target CPU/GPU code generation strategy, and present comprehensive benchmark results across scientific computing workloads.
No votes yet.
Please wait...

You must be logged in to post a comment.

Recent source codes

* * *

* * *

HGPU group © 2010-2026 hgpu.org

All rights belong to the respective authors

Contact us: