high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » CRINK: Automatic CUDA code generation for affine C programs

CRINK: Automatic CUDA code generation for affine C programs

Akanksha Singh

Department of Computer Science and Engineering, Indian Institute of Technology, Kanpur

Indian Institute of Technology, 2014

BibTeX

Download (PDF)

View

Source

2061

views

Parallel programming has largely evolved as an efficient solution to a large number of compute intensive applications. Graphics Processing Unit (GPUs), traditionally designed to process computer graphics, are now widely applied to process large chunks of data parallely in many computationally expensive applications. While developing parallel programs to run on parallel computing platforms, such as CUDA, OpenCL, etc. requires knowledge of platform-specific concepts, it becomes very convenient if the process of parallelizing compute intensive sections of the program can be automated. We develop a tool CRINK, an end-to-end code transformation system, to convert sequential C programs to their parallel counterparts in CUDA. CRINK targets to parallelize the expensive sections (sections within loops) of the program while converting C programs to CUDA C programs. It incorporates handling of both irregular and regular kernels. We use concepts of Cycle Shrinking and Extended Cycle Shrinking for parallelism extractions and loop transformations. To analyse the performance, we run CRINK over the expensive sections taken from ZERO RC, SPEC, SANDIA RULES, Treepack and Higbie standard benchmarks. Analysis is done over 66 varied configurations of the benchmarks and datasets where we observe that drastic drops in computation times are achieved as the number of threads are increased while execution of the code transformed by CRINK.

Tags: Code generation, Computer science, CUDA, nVidia, Tesla C1060, Thesis

August 10, 2015 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

CRINK: Automatic CUDA code generation for affine C programs

Your response

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)

CRINK: Automatic CUDA code generation for affine C programs

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)