Long Code for Code Search

hgpu.org » Applications » Computer science » Long Code for Code Search

Long Code for Code Search

Fan Hu, Yanlin Wang, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, Xirong Li

School of Information, Renmin University of China

arXiv:2208.11271 [cs.SE], (https://arxiv.org/pdf/2208.11271.pdf)

DOI:10.48550/arXiv.2208.11271

BibTeX

Download (PDF)

View

Source

928

views

Thanks to the Transformer-based pretraining models, the performance of code search has been improved significantly. However, due to the restriction of multi-head self-attention and GPU memory, there is a limit on the input token length. The existing pretrained code models, such as GraphCodeBERT, CodeBERT, RoBERTa (code), take the first 256 tokens by default, which makes them unable to represent the complete information of long code (i.e., code that is greater than 256 tokens). Unlike the long text document that can be regarded as a whole with complete semantics, the semantics of long code is discontinuous as a piece of long code may contain different code modules. Therefore, it is unreasonable to directly apply the long text processing methods to long code. To tackle the long code problem, we propose MLCS (Modeling Long Code for Code Search) to obtain a better representation for long code. Our experimental results show the effectiveness of MLCS for long code retrieval. With MLCS, we could use Transformer-based pretraining models to model long code without changing their internal structure and re-pretraining. Through AST-based splitting and attention-based fusion methods, MLCS achieves an overall mean reciprocal ranking (MRR) score of 0.785, outperforming the previous state-of-the-art result of 0.713 on the public CodeSearchNet benchmark.

Tags: Computer science, Deep learning, NLP, nVidia, Software Engineering, Tesla V100

August 28, 2022 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org