high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » ParaCodex: A Profiling-Guided Autonomous Coding Agent for Reliable Parallel Code Generation and Translation

ParaCodex: A Profiling-Guided Autonomous Coding Agent for Reliable Parallel Code Generation and Translation

Erel Kaplan, Tomer Bitan, Lian Ghrayeb, Le Chen, Tom Yotam, Niranjan Hasabnis, Gal Oren

Technion – Israel Institute of Technology, Haifa, Israel

arXiv:2601.04327 [cs.DC]

DOI:10.48550/arXiv.2601.04327

@misc{kaplan2026paracodexprofilingguidedautonomouscoding,

title={ParaCodex: A Profiling-Guided Autonomous Coding Agent for Reliable Parallel Code Generation and Translation},

author={Erel Kaplan and Tomer Bitan and Lian Ghrayeb and Le Chen and Tom Yotam and Niranjan Hasabnis and Gal Oren},

year={2026},

eprint={2601.04327},

archivePrefix={arXiv},

primaryClass={cs.DC},

url={https://arxiv.org/abs/2601.04327}

}

Download (PDF)

View

Source

Source codes

Package:

ParaCodex: A Profiling-Guided Autonomous Coding Agent for Reliable Parallel Code Generation and Translation

718

views

Parallel programming is central to HPC and AI, but producing code that is correct and fast remains challenging, especially for OpenMP GPU offload, where data movement and tuning dominate. Autonomous coding agents can compile, test, and profile on target hardware, but outputs are brittle without domain scaffolding. We present ParaCodex, an HPC-engineer workflow that turns a Codex-based agent into an autonomous OpenMP GPU offload system using staged hotspot analysis, explicit data planning, correctness gating, and profiling-guided refinement. We evaluate translation from serial CPU kernels to OpenMP GPU offload kernels on HeCBench, Rodinia, and NAS. After excluding five kernels, ParaCodex succeeded on all 31 valid kernels. The generated kernels improved GPU time over reference OpenMP implementations in 25/31 cases, achieving geometric-mean speedups of 3x on HeCBench and 5x on Rodinia, and outperforming a zero-shot Codex baseline on all suites. We also evaluate CUDA to OpenMP offload translation on ParEval, where ParaCodex maintains high compilation and validation rates in code-only and end-to-end settings.

Tags: Code generation, Computer science, CUDA, LLM, nVidia, nVidia GeForce RTX 4060, OpenMP, Package

January 12, 2026 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations

* * *

high performance computing on graphics processing units: hgpu.org

ParaCodex: A Profiling-Guided Autonomous Coding Agent for Reliable Parallel Code Generation and Translation

Package:

Your response

Recent source codes

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Vortex-Optimized Light-weight Toolchain (VOLT)

SciDef: Automated Definition Extraction from Scientific Literature

bioagent-bench: Benchmark for evaluating LLM agents in bioinformatics

Benchmark suite for LLM inference on NVIDIA consumer GPUs

Theorizer: from the paper Generating Literature-Driven Scientific Discoveries at Scale

Nsight Python: a Python kernel profiling interface based on NVIDIA Nsight Tools

Awesome LLM-Driven Kernel Generation

Most viewed papers (last 30 days)

ParaCodex: A Profiling-Guided Autonomous Coding Agent for Reliable Parallel Code Generation and Translation

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)