high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Composition and Reuse with Compiled Domain-Specific Languages

Composition and Reuse with Compiled Domain-Specific Languages

Arvind K. Sujeeth, Tiark Rompf, Kevin J. Brown, HyoukJoong Lee, Hassan Chafi, Victoria Popic, Michael Wu, Aleksander Prokopec, Vojin Jovanovic, Martin Odersky, Kunle Olukotun

Stanford University

European Conference on Object-Oriented Programming (ECOOP’13), 2013

@inproceedings{sujeeth2013composition,

title={Composition and reuse with compiled domain-specific languages},

author={Sujeeth, Arvind K and Rompf, Tiark and Brown, Kevin J and Lee, H and Chafi, Hassan and Popic, Victoria and Wu, Michael and Prokopec, Aleksandar and Jovanovic, Vojin and Odersky, Martin and others},

booktitle={Proceedings of ECOOP},

year={2013}

}

Download (PDF)

View

Source

Source codes

Package:

Delite

1783

views

Programmers who need high performance currently rely on low-level, architecture-specific programming models (e.g. OpenMP for CMPs, CUDA for GPUs, MPI for clusters). Performance optimization with these frameworks usually requires expertise in the specific programming model and a deep understanding of the target architecture. Domain-specific languages (DSLs) are a promising alternative, allowing compilers to map problem-specific abstractions directly to low-level architecture-specific programming models. However, developing DSLs is difficult, and using multiple DSLs together in a single application is even harder because existing compiled solutions do not compose together. In this paper, we present four new performance-oriented DSLs developed with Delite, an extensible DSL compilation framework. We demonstrate new techniques to compose compiled DSLs embedded in a common backend together in a single program and show that generic optimizations can be applied across the different DSL sections. Our new DSLs are implemented with a small number of reusable components (less than 9 parallel operators total) and still achieve performance up to 125x better than library implementations and at worst within 30% of optimized stand-alone DSLs. The DSLs retain good performance when composed together, and applying cross-DSL optimizations results in up to an additional 1.82x improvement.

Tags: Computer science, CUDA, MPI, nVidia, Package, Performance, Tesla C2050

May 31, 2013 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Composition and Reuse with Compiled Domain-Specific Languages

Package:

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Composition and Reuse with Compiled Domain-Specific Languages

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)