high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Reducing the Cost of Heuristic Generation with Machine Learning

Reducing the Cost of Heuristic Generation with Machine Learning

William F. Ogilvie

Institute for Computing Systems Architecture, School of Informatics, University of Edinburgh

University of Edinburgh, 2018

@article{ogilvie2018reducing,

title={Reducing the cost of heuristic generation with machine learning},

author={Ogilvie, William Fraser},

year={2018},

publisher={The University of Edinburgh}

}

Download (PDF)

View

Source

2208

views

The space of compile-time transformations and or run-time options which can improve the performance of a given code is usually so large as to be virtually impossible to search in any practical time-frame. Thus, heuristics are leveraged which can suggest good but not necessarily best configurations. Unfortunately, since such heuristics are tightly coupled to processor architecture performance is not portable; heuristics must be tuned, traditionally manually, for each device in turn. This is extremely laborious and the result is often outdated heuristics and less effective optimisation. Ideally, to keep up with changes in hardware and run-time environments a fast and automated method to generate heuristics is needed. Recent works have shown that machine learning can be used to produce mathematical models or rules in their place, which is automated but not necessarily fast. This thesis proposes the use of active machine learning, sequential analysis, and active feature acquisition to accelerate the training process in an automatic way, thereby tackling this timely and substantive issue. First, a demonstration of the efficiency of active learning over the previously standard supervised machine learning technique is presented in the form of an ensemble algorithm. This algorithm learns a model capable of predicting the best processing device in a heterogeneous system to use per workload size, per kernel. Active machine learning is a methodology which is sensitive to the cost of training; specifically, it is able to reduce the time taken to construct a model by predicting how much is expected to be learnt from each new training instance and then only choosing to learn from those most profitable examples. The exemplar heuristic is constructed on average 4x faster than a baseline approach, whilst maintaining comparable quality. Next, a combination of active learning and sequential analysis is presented which reduces both the number of samples per training example as well as the number of training examples overall. This allows for the creation of models based on noisy information, sacrificing accuracy per training instance for speed, without having a significant affect on the quality of the final product. In particular, the runtime of high-performance compute kernels is predicted from code transformations one may want to apply using a heuristic which was generated up to 26x faster than with active learning alone. Finally, preliminary work demonstrates that an automated system can be created which optimises both the number of training examples as well as which features to select during training to further substantially accelerate learning, in cases where each feature value that is revealed comes at some cost.

Tags: Computer science, Heterogeneous systems, Machine learning, nVidia, nVidia GeForce GTX Titan, OpenCL, Thesis

July 1, 2018 by hgpu

Rating: 2.0/5. From 1 vote.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Reducing the Cost of Heuristic Generation with Machine Learning

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

Reducing the Cost of Heuristic Generation with Machine Learning

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)