Investigating Input Representations and Representation Models of Source Code for Machine Learning

Alexander Brauckmann
Technische Universität Dresden
TU Dresden, 2020


   title={Investigating Input Representations and Representation Models of Source Code for Machine Learning},

   author={Brauckmann, Alexander},



Download Download (PDF)   View View   Source Source   



Machine Learning methods are actively used to solve various tasks on source code, such as in Compilers to improve performance of executable code, or IDEs to boost developer productivity. While the use cases are manifold, most of these methods rely on manually-defined features that require substantial engineering efforts, while not necessarily being optimal. In this thesis, we introduce a novel approach to encode programs as graphs that include compiler-internal semantics and use the recently discovered class of Graph Neural Networks to learn task-specific features automatically. Specifically, we design a framework for learning program representations based on Abstract Syntax Trees and Control- and Dataflow Graphs, extracted with the Clang/LLVM compiler infrastructure. We empirically evaluate the approach in compiler heuristic use cases and show to outperform existing methods based on Recurrent Neural Networks (RNNs) in generalization performance and inference time. In the task of code generation however, we show limitations of the graph-generative architecture we used, which cause a bias towards generating samples of less size and complexity.
Rating: 1.0/5. From 1 vote.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: