high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » NaturalCC: A Toolkit to Naturalize the Source Code Corpus

NaturalCC: A Toolkit to Naturalize the Source Code Corpus

Yao Wan, Yang He, Jian-Guo Zhang, Yulei Sui, Hai Jin, Guandong Xu, Caiming Xiong, Philip S. Yu

School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China

arXiv:2012.03225 [cs.SE], (6 Dec 2020)

BibTeX

Download (PDF)

View

Source

Source codes

Package:

NaturalCC: A Toolkit to Naturalize the Source Code Corpus

1942

views

We present NaturalCC, an efficient and extensible toolkit to bridge the gap between natural language and programming language, and facilitate the research on big code analysis. Using NaturalCC, researchers both from natural language or programming language communities can quickly and easily reproduce the state-of-the-art baselines and implement their approach. NaturalCC is built upon Fairseq and PyTorch, providing (1) an efficient computation with multi-GPU and mixed-precision data processing for fast model training, (2) a modular and extensible framework that makes it easy to reproduce or implement an approach for big code analysis, and (3) a command line interface and a graphical user interface to demonstrate each model’s performance. Currently, we have included several state-of-the-art baselines across different tasks (e.g., code completion, code comment generation, and code retrieval) for demonstration. The video of this demo is available.

Tags: Computer science, CUDA, Deep learning, NLP, nVidia, Package, Programming Languages, Python, Video

December 13, 2020 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

NaturalCC: A Toolkit to Naturalize the Source Code Corpus

Package:

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

NaturalCC: A Toolkit to Naturalize the Source Code Corpus

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)