high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » ALPyNA: Acceleration of Loops in Python for Novel Architectures

ALPyNA: Acceleration of Loops in Python for Novel Architectures

Dejice Jacob, Jeremy Singer

School of Computing Science, University of Glasgow, Glasgow, UK

6th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array Programming (ARRAY ’19), 2019

BibTeX

Download (PDF)

View

Source

1796

views

We present ALPyNA, an automatic loop parallelization framework for Python, which analyzes data dependences within nested loops and dynamically generates CUDA kernels for GPU execution. The ALPyNA system applies classical dependence analysis techniques to discover and exploit potential parallelism. The skeletal structure of the dependence graph is determined statically (if possible) or at runtime; this is combined with type and bounds information discovered at runtime, to auto-generate high-performance kernels for offload to GPU. We demonstrate speedups of up to 1000x relative to the native CPython interpreter across four array-intensive numerical Python benchmarks. Performance improvement is related to both iteration domain size and dependence graph complexity. Nevertheless, this approach promises to bring the benefits of manycore parallelism to application developers.

Tags: Code generation, Computer science, CUDA, nVidia, nVidia GeForce GTX 1060, Python

September 22, 2019 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

ALPyNA: Acceleration of Loops in Python for Novel Architectures

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

ALPyNA: Acceleration of Loops in Python for Novel Architectures

Share this:

Recent source codes

Most viewed papers (last 30 days)