Tree LSTM implementation in PyTorch

Overview

Tree-Structured Long Short-Term Memory Networks

This is a PyTorch implementation of Tree-LSTM as described in the paper Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks by Kai Sheng Tai, Richard Socher, and Christopher Manning. On the semantic similarity task using the SICK dataset, this implementation reaches:

  • Pearson's coefficient: 0.8492 and MSE: 0.2842 using hyperparameters --lr 0.010 --wd 0.0001 --optim adagrad --batchsize 25
  • Pearson's coefficient: 0.8674 and MSE: 0.2536 using hyperparameters --lr 0.025 --wd 0.0001 --optim adagrad --batchsize 25 --freeze_embed
  • Pearson's coefficient: 0.8676 and MSE: 0.2532 are the numbers reported in the original paper.
  • Known differences include the way the gradients are accumulated (normalized by batchsize or not).

Requirements

  • Python (tested on 3.6.5, should work on >=2.7)
  • Java >= 8 (for Stanford CoreNLP utilities)
  • Other dependencies are in requirements.txt Note: Currently works with PyTorch 0.4.0. Switch to the pytorch-v0.3.1 branch if you want to use PyTorch 0.3.1.

Usage

Before delving into how to run the code, here is a quick overview of the contents:

  • Use the script fetch_and_preprocess.sh to download the SICK dataset, Stanford Parser and Stanford POS Tagger, and Glove word vectors (Common Crawl 840) -- Warning: this is a 2GB download!), and additionally preprocees the data, i.e. generate dependency parses using Stanford Neural Network Dependency Parser.
  • main.pydoes the actual heavy lifting of training the model and testing it on the SICK dataset. For a list of all command-line arguments, have a look at config.py.
    • The first run caches GLOVE embeddings for words in the SICK vocabulary. In later runs, only the cache is read in during later runs.
    • Logs and model checkpoints are saved to the checkpoints/ directory with the name specified by the command line argument --expname.

Next, these are the different ways to run the code here to train a TreeLSTM model.

Local Python Environment

If you have a working Python3 environment, simply run the following sequence of steps:

- bash fetch_and_preprocess.sh
- pip install -r requirements.txt
- python main.py

Pure Docker Environment

If you want to use a Docker container, simply follow these steps:

- docker build -t treelstm .
- docker run -it treelstm bash
- bash fetch_and_preprocess.sh
- python main.py

Local Filesystem + Docker Environment

If you want to use a Docker container, but want to persist data and checkpoints in your local filesystem, simply follow these steps:

- bash fetch_and_preprocess.sh
- docker build -t treelstm .
- docker run -it --mount type=bind,source="$(pwd)",target="/root/treelstm.pytorch" treelstm bash
- python main.py

NOTE: Setting the environment variable OMP_NUM_THREADS=1 usually gives a speedup on the CPU. Use it like OMP_NUM_THREADS=1 python main.py. To run on a GPU, set the CUDA_VISIBLE_DEVICES instead. Usually, CUDA does not give much speedup here, since we are operating at a batchsize of 1.

Notes

  • (Apr 02, 2018) Added Dockerfile
  • (Apr 02, 2018) Now works on PyTorch 0.3.1 and Python 3.6, removed dependency on Python 2.7
  • (Nov 28, 2017) Added frozen embeddings, closed gap to paper.
  • (Nov 08, 2017) Refactored model to get 1.5x - 2x speedup.
  • (Oct 23, 2017) Now works with PyTorch 0.2.0.
  • (May 04, 2017) Added support for sparse tensors. Using the --sparse argument will enable sparse gradient updates for nn.Embedding, potentially reducing memory usage.
    • There are a couple of caveats, however, viz. weight decay will not work in conjunction with sparsity, and results from the original paper might not be reproduced using sparse embeddings.

Acknowledgements

Shout-out to Kai Sheng Tai for the original LuaTorch implementation, and to the Pytorch team for the fun library.

Contact

Riddhiman Dasgupta

This is my first PyTorch based implementation, and might contain bugs. Please let me know if you find any!

License

MIT

Owner
Riddhiman Dasgupta
Deep Learning, Science Fiction, Comic Books
Riddhiman Dasgupta
Human Activity Recognition example using TensorFlow on smartphone sensors dataset and an LSTM RNN. Classifying the type of movement amongst six activity categories - Guillaume Chevalier

LSTMs for Human Activity Recognition Human Activity Recognition (HAR) using smartphones dataset and an LSTM RNN. Classifying the type of movement amon

Guillaume Chevalier 3.1k Dec 30, 2022
Spectralformer: Rethinking hyperspectral image classification with transformers

The code in this toolbox implements the "Spectralformer: Rethinking hyperspectral image classification with transformers". More specifically, it is detailed as follow.

Danfeng Hong 104 Jan 04, 2023
A static analysis library for computing graph representations of Python programs suitable for use with graph neural networks.

python_graphs This package is for computing graph representations of Python programs for machine learning applications. It includes the following modu

Google Research 258 Dec 29, 2022
Based on the given clinical dataset, Predict whether the patient having Heart Disease or Not having Heart Disease

Heart_Disease_Classification Based on the given clinical dataset, Predict whether the patient having Heart Disease or Not having Heart Disease Dataset

Ashish 1 Jan 30, 2022
Repository for the Bias Benchmark for QA dataset.

BBQ Repository for the Bias Benchmark for QA dataset. Authors: Alicia Parrish, Angelica Chen, Nikita Nangia, Vishakh Padmakumar, Jason Phang, Jana Tho

ML² AT CILVR 18 Nov 18, 2022
Official implementation of particle-based models (GNS and DPI-Net) on the Physion dataset.

Physion: Evaluating Physical Prediction from Vision in Humans and Machines [paper] Daniel M. Bear, Elias Wang, Damian Mrowca, Felix J. Binder, Hsiao-Y

Hsiao-Yu Fish Tung 18 Dec 19, 2022
The 2nd place solution of 2021 google landmark retrieval on kaggle.

Google_Landmark_Retrieval_2021_2nd_Place_Solution The 2nd place solution of 2021 google landmark retrieval on kaggle. Environment We use cuda 11.1/pyt

229 Dec 13, 2022
Lightweight library to build and train neural networks in Theano

Lasagne Lasagne is a lightweight library to build and train neural networks in Theano. Its main features are: Supports feed-forward networks such as C

Lasagne 3.8k Dec 29, 2022
Real-ESRGAN aims at developing Practical Algorithms for General Image Restoration.

Real-ESRGAN Colab Demo for Real-ESRGAN . Portable Windows executable file. You can find more information here. Real-ESRGAN aims at developing Practica

Xintao 17.2k Jan 02, 2023
Supervised 3D Pre-training on Large-scale 2D Natural Image Datasets for 3D Medical Image Analysis

Introduction This is an implementation of our paper Supervised 3D Pre-training on Large-scale 2D Natural Image Datasets for 3D Medical Image Analysis.

24 Dec 06, 2022
JDet is Object Detection Framework based on Jittor.

JDet is Object Detection Framework based on Jittor.

135 Dec 14, 2022
[NeurIPS 2021] “Improving Contrastive Learning on Imbalanced Data via Open-World Sampling”,

Improving Contrastive Learning on Imbalanced Data via Open-World Sampling Introduction Contrastive learning approaches have achieved great success in

VITA 24 Dec 17, 2022
Code to go with the paper "Decentralized Bayesian Learning with Metropolis-Adjusted Hamiltonian Monte Carlo"

dblmahmc Code to go with the paper "Decentralized Bayesian Learning with Metropolis-Adjusted Hamiltonian Monte Carlo" Requirements: https://github.com

1 Dec 17, 2021
A set of tools for creating and testing machine learning features, with a scikit-learn compatible API

Feature Forge This library provides a set of tools that can be useful in many machine learning applications (classification, clustering, regression, e

Machinalis 380 Nov 05, 2022
Lex Rosetta: Transfer of Predictive Models Across Languages, Jurisdictions, and Legal Domains

Lex Rosetta: Transfer of Predictive Models Across Languages, Jurisdictions, and Legal Domains This is an accompanying repository to the ICAIL 2021 pap

4 Dec 16, 2021
Ontologysim: a Owlready2 library for applied production simulation

Ontologysim: a Owlready2 library for applied production simulation Ontologysim is an open-source deep production simulation framework, with an emphasi

10 Nov 30, 2022
DiscoNet: Learning Distilled Collaboration Graph for Multi-Agent Perception [NeurIPS 2021]

DiscoNet: Learning Distilled Collaboration Graph for Multi-Agent Perception [NeurIPS 2021] Yiming Li, Shunli Ren, Pengxiang Wu, Siheng Chen, Chen Feng

Automation and Intelligence for Civil Engineering (AI4CE) Lab @ NYU 98 Dec 21, 2022
This repo provides the source code & data of our paper "GreaseLM: Graph REASoning Enhanced Language Models"

GreaseLM: Graph REASoning Enhanced Language Models This repo provides the source code & data of our paper "GreaseLM: Graph REASoning Enhanced Language

137 Jan 02, 2023
Breaching - Breaching privacy in federated learning scenarios for vision and text

Breaching - A Framework for Attacks against Privacy in Federated Learning This P

Jonas Geiping 139 Jan 03, 2023
Toward Multimodal Image-to-Image Translation

BicycleGAN Project Page | Paper | Video Pytorch implementation for multimodal image-to-image translation. For example, given the same night image, our

Jun-Yan Zhu 1.4k Dec 22, 2022