TensorFlow code for the neural network presented in the paper: "Structural Language Models of Code" (ICML'2020)

Overview

SLM: Structural Language Models of Code

This is an official implementation of the model described in:

"Structural Language Models of Code" [PDF]

To appear in ICML'2020.

An online demo is available at https://AnyCodeGen.org.

This repository currently contains the dataset and the data extractor that we used to create the Java dataset in the paper. The TensorFlow code will be released soon.

Feel free to open a new issue for any question. We always respond quickly.

Table of Contents

Requirements

  • python3
  • TensorFlow 1.13 or newer (install). To check TensorFlow version:

python3 -c 'import tensorflow as tf; print(tf.__version__)'

Download our preprocessed Java-small dataset

This dataset contains ~1.3M examples (1.1GB).

mkdir data
cd data
wget https://codegen-slm.s3.us-east-2.amazonaws.com/data/java-small-preprocessed.tar.gz
tar -xvzf java-small-preprocessed.tar.gz

This will create a data/java-small/ sub-directory, containing the files that hold training, test and validation sets, a dict file for various dataset properties and histograms, and a grammar file that is used during beam search to distinguish between terminal and non-terminal nodes.

Creating and preprocessing a new Java dataset

To create and preprocess a new dataset (for example, to compare SLM to a new model on another dataset):

  • Edit the file preprocess.sh using the instructions there, pointing it to the correct training, validation and test directories.
  • Run the preprocess.sh file:

bash preprocess.sh

Datasets

Java

To download the Java-small as raw *.java files, use:

To download the preprocessed dataset, use:

To download the dataset in a tokenized format that can be used in seq2seq models (for example, with OpenNMT-py), use:

The following JSON files are the files that are created by the JavaExtractor. The preprocessed and the seq2seq files are created from these JSON files:

Every line is a JSON object that contains the following fields: num_targets, num_nodes, targets, is_token, target_child_id, internal_paths, relative_paths, head_paths, head_root_path, head_child_id, linearized_tree, filepath, left_context, right_context, target_seq, line.

C#

The C# dataset that we used in the paper was created using the raw (*.cs files) dataset of Allamanis et al., 2018, (https://aka.ms/iclr18-prog-graphs-dataset) and can be found here: https://aka.ms/iclr18-prog-graphs-dataset.

To extract examples from the C# files, we modified the data extraction code of Brockschmidt et al., 2019: https://github.com/microsoft/graph-based-code-modelling/.

Querying the Trained Model

To query the trained model, use the following API, where MYCODE is the given code snippet, that includes two question marks (??) to mark the "hole" that should be completed:

curl -X POST https://w0w3uc4a63.execute-api.us-east-1.amazonaws.com/prod/predict -d '{"code": "MYCODE"}'

For example:

curl -X POST https://w0w3uc4a63.execute-api.us-east-1.amazonaws.com/prod/predict -d '{"code": "public static Path[] stat2Paths(FileStatus[] stats) {  if (stats == null) return null;  Path[] ret = new Path[stats.length]; for (int i = 0; i < stats.length; ++i) { ret[i] = ??; } return ret; }"}'

Citation

Structural Language Models of Code

@article{alon2019structural,
  title={Structural Language Models of Code},
  author={Alon, Uri and Sadaka, Roy and Levy, Omer and Yahav, Eran},
  journal={arXiv preprint arXiv:1910.00577},
  year={2019}
}
Colour detection is necessary to recognize objects, it is also used as a tool in various image editing and drawing apps.

Colour Detection On Image Colour detection is the process of detecting the name of any color. Simple isn’t it? Well, for humans this is an extremely e

Astitva Veer Garg 1 Jan 13, 2022
This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).

MoEBERT This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022). Installation Create an

Simiao Zuo 34 Dec 24, 2022
Implement some metaheuristics and cost functions

Metaheuristics This repot implement some metaheuristics and cost functions. Metaheuristics JAYA Implement Jaya optimizer without constraints. Cost fun

Adri1G 1 Mar 23, 2022
Gauge equivariant mesh cnn

Geometric Mesh CNN The code in this repository is an implementation of the Gauge Equivariant Mesh CNN introduced in the paper Gauge Equivariant Mesh C

50 Dec 18, 2022
Council-GAN - Implementation for our paper Breaking the Cycle - Colleagues are all you need (CVPR 2020)

Council-GAN Implementation of our paper Breaking the Cycle - Colleagues are all you need (CVPR 2020) Paper Ori Nizan , Ayellet Tal, Breaking the Cycle

ori nizan 260 Nov 16, 2022
Implementation of "Debiasing Item-to-Item Recommendations With Small Annotated Datasets" (RecSys '20)

Debiasing Item-to-Item Recommendations With Small Annotated Datasets This is the code for our RecSys '20 paper. Other materials can be found here: Ful

Microsoft 34 Aug 10, 2022
ROS-UGV-Control-Interface - Control interface which can be used in any UGV

ROS-UGV-Control-Interface Cam Closed: Cam Opened:

Ahmet Fatih Akcan 1 Nov 04, 2022
Codes for AAAI 2022 paper: Context-aware Health Event Prediction via Transition Functions on Dynamic Disease Graphs

Context-Aware-Healthcare Codes for AAAI 2022 paper: Context-aware Health Event Prediction via Transition Functions on Dynamic Disease Graphs Download

LuChang 9 Dec 26, 2022
CSAW-M: An Ordinal Classification Dataset for Benchmarking Mammographic Masking of Cancer

CSAW-M This repository contains code for CSAW-M: An Ordinal Classification Dataset for Benchmarking Mammographic Masking of Cancer. Source code for tr

Yue Liu 7 Oct 11, 2022
Using deep actor-critic model to learn best strategies in pair trading

Deep-Reinforcement-Learning-in-Stock-Trading Using deep actor-critic model to learn best strategies in pair trading Abstract Partially observed Markov

281 Dec 09, 2022
Numerical Methods with Python, Numpy and Matplotlib

Numerical Bric-a-Brac Collections of numerical techniques with Python and standard computational packages (Numpy, SciPy, Numba, Matplotlib ...). Diffe

Vincent Bonnet 10 Dec 20, 2021
Create and implement a deep learning library from scratch.

In this project, we create and implement a deep learning library from scratch. Table of Contents Deep Leaning Library Table of Contents About The Proj

Rishabh Bali 22 Aug 23, 2022
CZU-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and 10 wearable inertial sensors

CZU-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and 10 wearable inertial sensors   In order to facilitate the res

yujmo 11 Dec 12, 2022
A PyTorch implementation of Mugs proposed by our paper "Mugs: A Multi-Granular Self-Supervised Learning Framework".

Mugs: A Multi-Granular Self-Supervised Learning Framework This is a PyTorch implementation of Mugs proposed by our paper "Mugs: A Multi-Granular Self-

Sea AI Lab 62 Nov 08, 2022
wlad 2 Dec 19, 2022
Code and datasets for TPAMI 2021

SkeletonNet This repository constains the codes and ShapeNetV1-Surface-Skeleton,ShapNetV1-SkeletalVolume and 2d image datasets ShapeNetRendering. Plea

34 Aug 15, 2022
Another pytorch implementation of FCN (Fully Convolutional Networks)

FCN-pytorch-easiest Trying to be the easiest FCN pytorch implementation and just in a get and use fashion Here I use a handbag semantic segmentation f

Y. Dong 158 Dec 21, 2022
Proposed n-stage Latent Dirichlet Allocation method - A Novel Approach for LDA

n-stage Latent Dirichlet Allocation (n-LDA) Proposed n-LDA & A Novel Approach for classical LDA Latent Dirichlet Allocation (LDA) is a generative prob

Anıl Güven 4 Mar 07, 2022
A chemical analysis of lipophilicities & molecule drawings including ML

A chemical analysis of lipophilicity & molecule drawings including a bit of ML analysis. This is a simple project that includes two Jupyter files (one

Aurimas A. Nausėdas 7 Nov 22, 2022
Python Implementation of algorithms in Graph Mining, e.g., Recommendation, Collaborative Filtering, Community Detection, Spectral Clustering, Modularity Maximization, co-authorship networks.

Graph Mining Author: Jiayi Chen Time: April 2021 Implemented Algorithms: Network: Scrabing Data, Network Construbtion and Network Measurement (e.g., P

Jiayi Chen 3 Mar 03, 2022