27502

A Study on Neural-based Code Summarization in Low-resource Settings

Yang He
University of Technology Sydney, Faculty of Engineering and Information Technology
University of Technology Sydney, 2022

@phdthesis{he2022study,

   title={A Study on Neural-based Code Summarization in Low-resource Settings},

   author={He, Yang},

   year={2022}

}

Automated software engineering with deep learning techniques has been comprehensively explored because of breakthroughs in code representation learning. Many code intelligence approaches have been proposed for the downstream tasks of this field in the past years, contributing to significant performance progress. Code summarization has been the central research topic among these downstream tasks because of its contributions to practical applications, e.g., software development and maintenance. It remains challenging to represent code snippets and generate more accurate descriptions to summarize the functionality and semantics of programs. Existing methods of the code summarization task have been devised to tackle real-world problems and have been successfully proven effective. However, there is little attention to its application in novel programming languages where only a few well-documented programs in these low-resource languages are available for training. According to our observation, existing approaches can only acquire poor performances in such settings, and we attribute the problem to data-hungry and programming language gaps. Enlightened by recent pre-training methods, we propose METASUM, a meta-learning-based code summarization model, to extract prior and shared knowledge from high-resource programming language where high-quality code snippets are easily accessible and then adapt it to low-resource settings. The critical contribution of this dissertation is that we (1) give a comprehensive illustration of the development of machine-learning-based code summarization task, (2) identify a new problem of low-resource code summarization and propose a meta-learning-based model to improve over other methods by 3.18 and 1.79 BLEU points over state-of-the-art pre-trained models on Nix and Ruby datasets, respectively, and (3) introduce a machine-learning-based toolkit, NATURALCC, for fair comparison of models for the automated software engineering community.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: