high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » HPC++: An LLVM-Based Automatic Parallelization Framework with Heterogeneous CPU–GPU Execution

HPC++: An LLVM-Based Automatic Parallelization Framework with Heterogeneous CPU–GPU Execution

Vahram Martirosyan

DOI:10.13140/RG.2.2.21490.77763

@article{martirosyan2026hpc++,

title={HPC++: An LLVM-Based Automatic Parallelization Framework with Heterogeneous CPU–GPU Execution},

author={Martirosyan, Vahram},

year={2026}

}

Download (PDF)

View

Source

256

views

We present HPC++, an automatic parallelization framework that transforms sequential C++ programs into efficient parallel implementations targeting both multi-core CPUs and OpenCL-capable GPUs. Operating at the LLVM Intermediate Representation (IR) level, HPC++ performs pattern-driven analysis to detect seven distinct parallelization strategies—including reductions, elementwise maps, matrix multiplications, nested loops, search operations, histogram patterns, and independent function calls—and emits optimized parallel wrappers with zero source-code modifications. On an Intel Core Ultra 7 255H (16 cores) with an integrated Intel Graphics GPU (128 CUs) employing a Unified Memory Architecture (UMA), the framework achieves peak speedups of 2009.4× on GPU-offloaded workloads and 32.1× on CPU-parallelized tasks, while maintaining numerical correctness across all 134 unit tests and 18 integration tests. We describe the system architecture, the IR-level analysis and transformation pipeline, the dual-target CPU/GPU code generation strategy, and present comprehensive benchmark results across scientific computing workloads.

Tags: Computer science, HPC, LLVM, OpenCL

February 23, 2026 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations

* * *

high performance computing on graphics processing units: hgpu.org

HPC++: An LLVM-Based Automatic Parallelization Framework with Heterogeneous CPU–GPU Execution

Your response

Recent source codes

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Vortex-Optimized Light-weight Toolchain (VOLT)

SciDef: Automated Definition Extraction from Scientific Literature

Theorizer: from the paper Generating Literature-Driven Scientific Discoveries at Scale

bioagent-bench: Benchmark for evaluating LLM agents in bioinformatics

Benchmark suite for LLM inference on NVIDIA consumer GPUs

Nsight Python: a Python kernel profiling interface based on NVIDIA Nsight Tools

Awesome LLM-Driven Kernel Generation

Most viewed papers (last 30 days)

HPC++: An LLVM-Based Automatic Parallelization Framework with Heterogeneous CPU–GPU Execution

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)