Seq2seq - Sequence to Sequence Learning with Keras

Related tags

Deep Learningseq2seq
Overview

Seq2seq

Sequence to Sequence Learning with Keras

Hi! You have just found Seq2Seq. Seq2Seq is a sequence to sequence learning add-on for the python deep learning library Keras. Using Seq2Seq, you can build and train sequence-to-sequence neural network models in Keras. Such models are useful for machine translation, chatbots (see [4]), parsers, or whatever that comes to your mind.

seq2seq

Getting started

Seq2Seq contains modular and reusable layers that you can use to build your own seq2seq models as well as built-in models that work out of the box. Seq2Seq models can be compiled as they are or added as layers to a bigger model. Every Seq2Seq model has 2 primary layers : the encoder and the decoder. Generally, the encoder encodes the input sequence to an internal representation called 'context vector' which is used by the decoder to generate the output sequence. The lengths of input and output sequences can be different, as there is no explicit one on one relation between the input and output sequences. In addition to the encoder and decoder layers, a Seq2Seq model may also contain layers such as the left-stack (Stacked LSTMs on the encoder side), the right-stack (Stacked LSTMs on the decoder side), resizers (for shape compatibility between the encoder and the decoder) and dropout layers to avoid overfitting. The source code is heavily documented, so lets go straight to the examples:

A simple Seq2Seq model:

import seq2seq
from seq2seq.models import SimpleSeq2Seq

model = SimpleSeq2Seq(input_dim=5, hidden_dim=10, output_length=8, output_dim=8)
model.compile(loss='mse', optimizer='rmsprop')

That's it! You have successfully compiled a minimal Seq2Seq model! Next, let's build a 6 layer deep Seq2Seq model (3 layers for encoding, 3 layers for decoding).

Deep Seq2Seq models:

import seq2seq
from seq2seq.models import SimpleSeq2Seq

model = SimpleSeq2Seq(input_dim=5, hidden_dim=10, output_length=8, output_dim=8, depth=3)
model.compile(loss='mse', optimizer='rmsprop')

Notice that we have specified the depth for both encoder and decoder as 3, and your model has a total depth of 3 + 3 = 6. You can also specify different depths for the encoder and the decoder. Example:

import seq2seq
from seq2seq.models import SimpleSeq2Seq

model = SimpleSeq2Seq(input_dim=5, hidden_dim=10, output_length=8, output_dim=20, depth=(4, 5))
model.compile(loss='mse', optimizer='rmsprop')

Notice that the depth is specified as tuple, (4, 5). Which means your encoder will be 4 layers deep whereas your decoder will be 5 layers deep. And your model will have a total depth of 4 + 5 = 9.

Advanced Seq2Seq models:

Until now, you have been using the SimpleSeq2Seq model, which is a very minimalistic model. In the actual Seq2Seq implementation described in [1], the hidden state of the encoder is transferred to decoder. Also, the output of decoder at each timestep becomes the input to the decoder at the next time step. To make things more complicated, the hidden state is propogated throughout the LSTM stack. But you have no reason to worry, as we have a built-in model that does all that out of the box. Example:

import seq2seq
from seq2seq.models import Seq2Seq

model = Seq2Seq(batch_input_shape=(16, 7, 5), hidden_dim=10, output_length=8, output_dim=20, depth=4)
model.compile(loss='mse', optimizer='rmsprop')

Note that we had to specify the complete input shape, including the samples dimensions. This is because we need a static hidden state(similar to a stateful RNN) for transferring it across layers. (Update : Full input shape is not required in the latest version, since we switched to Recurrent Shop backend). By the way, Seq2Seq models also support the stateful argument, in case you need it.

You can also experiment with the hidden state propogation turned off. Simply set the arguments broadcast_state and inner_broadcast_state to False.

Peeky Seq2seq model:

Let's not stop there. Let's build a model similar to cho et al 2014, where the decoder gets a 'peek' at the context vector at every timestep.

cho et al 2014

To achieve this, simply add the argument peek=True:

import seq2seq
from seq2seq.models import Seq2Seq

model = Seq2Seq(batch_input_shape=(16, 7, 5), hidden_dim=10, output_length=8, output_dim=20, depth=4, peek=True)
model.compile(loss='mse', optimizer='rmsprop')

Seq2seq model with attention:

Attention Seq2seq

Let's not stop there either. In all the models described above, there is no allignment between the input sequence elements and the output sequence elements. But for machine translation, learning a soft allignment between the input and output sequences imporves performance.[3]. The Seq2seq framework includes a ready made attention model which does the same. Note that in the attention model, there is no hidden state propogation, and a bidirectional LSTM encoder is used by default. Example:

import seq2seq
from seq2seq.models import AttentionSeq2Seq

model = AttentionSeq2Seq(input_dim=5, input_length=7, hidden_dim=10, output_length=8, output_dim=20, depth=4)
model.compile(loss='mse', optimizer='rmsprop')

As you can see, in the attention model you need not specify the samples dimension as there are no static hidden states involved(But you have to if you are building a stateful Seq2seq model). Note: You can set the argument bidirectional=False if you wish not to use a bidirectional encoder.

Final Words

That's all for now. Hope you love this library. For any questions you might have, create an issue and I will get in touch. You can also contribute to this project by reporting bugs, adding new examples, datasets or models.

Installation:

sudo pip install git+https://github.com/farizrahman4u/seq2seq.git

Requirements:

Working Example:

Papers:

Owner
Fariz Rahman
Fariz Rahman
Dual Attention Network for Scene Segmentation (CVPR2019)

Dual Attention Network for Scene Segmentation(CVPR2019) Jun Fu, Jing Liu, Haijie Tian, Yong Li, Yongjun Bao, Zhiwei Fang,and Hanqing Lu Introduction W

Jun Fu 2.2k Dec 28, 2022
这是一个yolo3-tf2的源码,可以用于训练自己的模型。

YOLOV3:You Only Look Once目标检测模型在Tensorflow2当中的实现 目录 性能情况 Performance 所需环境 Environment 文件下载 Download 训练步骤 How2train 预测步骤 How2predict 评估步骤 How2eval 参考资料

Bubbliiiing 68 Dec 21, 2022
Implementation of Wasserstein adversarial attacks.

Stronger and Faster Wasserstein Adversarial Attacks Code for Stronger and Faster Wasserstein Adversarial Attacks, appeared in ICML 2020. This reposito

21 Oct 06, 2022
implicit displacement field

Geometry-Consistent Neural Shape Representation with Implicit Displacement Fields [project page][paper][cite] Geometry-Consistent Neural Shape Represe

Yifan Wang 100 Dec 19, 2022
Implementation of Convolutional enhanced image Transformer

CeiT : Convolutional enhanced image Transformer This is an unofficial PyTorch implementation of Incorporating Convolution Designs into Visual Transfor

Rishikesh (ऋषिकेश) 82 Dec 13, 2022
Omniverse sample scripts - A guide for developing with Python scripts on NVIDIA Ominverse

Omniverse sample scripts ここでは、NVIDIA Omniverse ( https://www.nvidia.com/ja-jp/om

ft-lab (Yutaka Yoshisaka) 37 Nov 17, 2022
Crowd-Kit is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets

Crowd-Kit: Computational Quality Control for Crowdsourcing Documentation Crowd-Kit is a powerful Python library that implements commonly-used aggregat

Toloka 125 Dec 30, 2022
Caffe-like explicit model constructor. C(onfig)Model

cmodel Caffe-like explicit model constructor. C(onfig)Model Installation pip install git+https://github.com/bonlime/cmodel Usage In order to allow usi

1 Feb 18, 2022
《Single Image Reflection Removal Beyond Linearity》(CVPR 2019)

Single-Image-Reflection-Removal-Beyond-Linearity Paper Single Image Reflection Removal Beyond Linearity. Qiang Wen, Yinjie Tan, Jing Qin, Wenxi Liu, G

Qiang Wen 51 Jun 24, 2022
Code for MSc Quantitative Finance Dissertation

MSc Dissertation Code ReadMe Sector Volatility Prediction Performance Using GARCH Models and Artificial Neural Networks Curtis Nybo MSc Quantitative F

2 Dec 01, 2022
code for generating data set ES-ImageNet with corresponding training code

es-imagenet-master code for generating data set ES-ImageNet with corresponding training code dataset generator some codes of ODG algorithm The variabl

Ordinarabbit 18 Dec 25, 2022
Earth Vision Foundation

EVer - A Library for Earth Vision Researcher EVer is a Pytorch-based Python library to simplify the training and inference of the deep learning model.

Zhuo Zheng 34 Nov 26, 2022
Source code and data from the RecSys 2020 article "Carousel Personalization in Music Streaming Apps with Contextual Bandits" by W. Bendada, G. Salha and T. Bontempelli

Carousel Personalization in Music Streaming Apps with Contextual Bandits - RecSys 2020 This repository provides Python code and data to reproduce expe

Deezer 48 Jan 02, 2023
Lightweight Python library for adding real-time object tracking to any detector.

Norfair is a customizable lightweight Python library for real-time 2D object tracking. Using Norfair, you can add tracking capabilities to any detecto

Tryolabs 1.7k Jan 05, 2023
Real time sign language recognition

The proposed work aims at converting american sign language gestures into English that can be understood by everyone in real time.

Mohit Kaushik 6 Jun 13, 2022
Named Entity Recognition with Small Strongly Labeled and Large Weakly Labeled Data

Named Entity Recognition with Small Strongly Labeled and Large Weakly Labeled Data arXiv This is the code base for weakly supervised NER. We provide a

Amazon 92 Jan 04, 2023
DeepFill v1/v2 with Contextual Attention and Gated Convolution, CVPR 2018, and ICCV 2019 Oral

Generative Image Inpainting An open source framework for generative image inpainting task, with the support of Contextual Attention (CVPR 2018) and Ga

2.9k Dec 16, 2022
Official project repository for 'Normality-Calibrated Autoencoder for Unsupervised Anomaly Detection on Data Contamination'

NCAE_UAD Official project repository of 'Normality-Calibrated Autoencoder for Unsupervised Anomaly Detection on Data Contamination' Abstract In this p

Jongmin Andrew Yu 2 Feb 10, 2022
DiffStride: Learning strides in convolutional neural networks

DiffStride is a pooling layer with learnable strides. Unlike strided convolutions, average pooling or max-pooling that require cross-validating stride values at each layer, DiffStride can be initiali

Google Research 113 Dec 13, 2022
Self-Supervised Pillar Motion Learning for Autonomous Driving (CVPR 2021)

Self-Supervised Pillar Motion Learning for Autonomous Driving Chenxu Luo, Xiaodong Yang, Alan Yuille Self-Supervised Pillar Motion Learning for Autono

QCraft 101 Dec 05, 2022