Extending OmpSs to support CUDA and OpenCL in C, C++ and Fortran Applications

hgpu.org » Applications » Computer science » Extending OmpSs to support CUDA and OpenCL in C, C++ and Fortran Applications

Extending OmpSs to support CUDA and OpenCL in C, C++ and Fortran Applications

Florentino Sainz, Sergi Mateo, Vicenc Beltran, Jose L. Bosque, Xavier Martorell, Eduard Ayguade

Barcelona Supercomputing Center, Barcelona, Spain

Barcelona Supercomputing Center, Research report, 2014

BibTeX

Download (PDF)

View

Source

2222

views

CUDA and OpenCL are the most widely used programming models to exploit hardware accelerators. Both programming models provide a C-based programming language to write accelerator kernels and a host API used to glue the host and kernel parts. Although this model is a clear improvement over a low-level and ad-hoc programming model for each hardware accelerator, it is still too complex and cumbersome for general adoption. For large and complex applications using several accelerators, the main problem becomes the explicit coordination and management of resources required between the host and the hardware accelerators that introduce a new family of issues (scheduling, data transfers, synchronization, …) that the programmer must take into account. In this paper, we propose a simple extension to OmpSs -a data-flow programming model- that dramatically simplifies the integration of accelerated code, in the form of CUDA or OpenCL kernels, into any C, C++ or Fortran application. Our proposal fully replaces the CUDA and OpenCL host APIs with a few pragmas, so we can leverage any kernel written in CUDA C or OpenCL C without any performance impact. Our compiler generates all the boilerplat code while our runtime system takes care of kernels scheduling, data transfers between host and accelerators and synchronizations between host and kernels parts. To evaluate our approach, we have ported several native CUDA and OpenCL applications to OmpSs by replacing all the CUDA or OpenCL API calls by a few number of pragmas. The OmpSs versions of these applications have competitive performance and scalability but with a significantly lower complexity than the original ones.

Tags: Computer science, CUDA, Fortran, nVidia, OpenCL, Tesla M2050

December 30, 2014 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org