high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection

Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection

Akihiro Hayashi, Kazuaki Ishizaki, Vivek Sarkar, Gita Koblents

Rice University

12th International Conference on the Principles and Practice of Programming on the Java Platform: virtual machines, languages, and tools (PPPJ’15), 2015

DOI:10.1145/2807426.2807429

BibTeX

Download (PDF)

View

Source

1866

views

High-level languages such as Java increase both productivity and portability with productive language features such as managed runtime, type safety, and precise exception semantics. Additionally, Java 8 provides parallel stream APIs with lambda expressions to facilitate parallel programming for mainstream users of multi-core CPUs and many-core GPUs. These high-level APIs avoid the complexity of writing natively running parallel programs with OpenMP and CUDA/OpenCL through Java Native Interface (JNI). The adoption of such high-level programming models offers opportunities for enabling compilers to perform parallel-aware optimizations and code generation. While many prior approaches have the ability to generate parallel code for both multi-core CPUs and many-core GPUs from Java and other high-level languages, selection of the preferred computing resource between CPUs and GPUs for individual kernels remains one of the most important challenges since a variety of factors affecting performance such as datasets and feature of programs need to be taken into account. This paper explores the possibility of using machine learning to address this challenge. The key idea is to enable a Java runtime to select a preferable hardware device with performance heuristics constructed by supervised machine-learning techniques. For this purpose, if our JIT compiler detects a parallel stream API, 1) our compiler records features of its computation such as the parallel loop range and the number of instructions and 2) our Java runtime generates these features for constructing training data. For the results reported in this paper, we constructed a prediction model with support vector machines (SVMs) after obtaining 291 samples by running 11 applications with different data sets and optimization levels. Our Java runtime then uses the SVMs to make predictions for unseen programs. Our experimental results on an IBM POWER8 platform with NVIDIA Tesla GPUs show that our prediction model predicts a faster configuration with up to 99.0% accuracy with 5-fold cross validation. Based on these results, we conclude that supervised machine-learning is a promising approach for building performance heuristics for mapping Java applications onto accelerators.

Tags: Code generation, Compilers, Computer science, CUDA, Java, Machine learning, nVidia, OpenCL, Tesla K40

October 27, 2015 by ahayashi

Rating: 1.5/5. From 2 votes.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection

Share this:

Recent source codes

Most viewed papers (last 30 days)