Improved Fitness Optimization Landscapes for Sequence Design

Last update: Dec 20, 2022

Overview

ReLSO

Improved Fitness Optimization Landscapes for Sequence Design

Description
Citation
How to run
Training models
Original data source

Description

In recent years, deep learning approaches for determining protein sequence-fitness relationships have gained traction. Advances in high-throughput mutagenesis, directed evolution, and next-generation sequencing have allowed for the accumulation of large amounts of labelled fitness data and consequently, attracted the application of various deep learning methods. Although these methods learn an implicit fitness landscape, there is little work on using the latent encoding directly for protein sequence optimization. Here we show that this latent space representation of a fitness landscape can be made very amenable to latent space optimization through a joint-training process. We also show that this encoding strategy which also provides improvements to generalization over more traditional training strategies. We apply our approach to several biological contexts and show that latent space optimization in a smooth learned folding landscape allows for more accurate and efficient optimization of protein sequences.

Citation

This repo accompanies the following publication:

Egbert Castro, Abhinav Godavarthi, Julien Rubinfien, Smita Krishnaswamy. Guided Generative Protein Design using Regularized Transformers. Nature Machine Intelligence, in review (2021).

How to run

First, install dependencies

# clone project   
git clone https://github.com/KrishnaswamyLab/ReLSO-Guided-Generative-Protein-Design-using-Regularized-Transformers.git


# install project   
cd ReLSO-Guided-Generative-Protein-Design-using-Regularized-Transformers 
pip install -e .   
pip install -r requirements.txt

Usage

Training models

# run training script
python train_relso.py  --data gifford

*note: if arg option is not relevant to current model selection, it will not be used. See init method of each model to see what's used.

available dataset args:

    gifford, GB1_WU, GFP, TAPE

available auxnetwork args:

    base_reg

Original data sources

You might also like...

An implementation of a sequence to sequence neural network using an encoder-decoder

Keras implementation of a sequence to sequence model for time series prediction using an encoder-decoder architecture. I created this post to share a

195 Dec 17, 2022

Sequence lineage information extracted from RKI sequence data repo

Pango lineage information for German SARS-CoV-2 sequences This repository contains a join of the metadata and pango lineage tables of all German SARS-

24 Oct 26, 2022

Official repository of OFA. Paper: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Paper | Blog OFA is a unified multimodal pretrained model that unifies modalities (i.e., cross-modality, vision, language) and tasks (e.g., image gene

1.4k Jan 8, 2023

Aircraft design optimization made fast through modern automatic differentiation

Aircraft design optimization made fast through modern automatic differentiation. Plug-and-play analysis tools for aerodynamics, propulsion, structures, trajectory design, and much more.

394 Dec 23, 2022

Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

scikit-opt Swarm Intelligence in Python (Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Algorithm, Immune Algorithm,A

3.7k Jan 3, 2023

library for nonlinear optimization, wrapping many algorithms for global and local, constrained or unconstrained, optimization

NLopt is a library for nonlinear local and global optimization, for functions with and without gradient information. It is designed as a simple, unifi

1.4k Dec 25, 2022

Racing line optimization algorithm in python that uses Particle Swarm Optimization.

Racing Line Optimization with PSO This repository contains a racing line optimization algorithm in python that uses Particle Swarm Optimization. Requi

6 Dec 14, 2022

Puzzle-CAM: Improved localization via matching partial and full features.

Puzzle-CAM The official implementation of "Puzzle-CAM: Improved localization via matching partial and full features".

150 Nov 14, 2022

[ECCVW2020] Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DiMP)

Feel free to visit my homepage Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DIMP) [ECCVW2020 paper] Presentation

35 Oct 26, 2022

Comments

Conda env create not working
When I type in the command as instructed in how to run, I get this error:

Warning: you have pip-installed dependencies in your environment file, but you do not list pip itself as one of your conda dependencies. Conda may not use the correct pip to install your packages, and they may end up in the wrong place. Please add an explicit pip dependency. I'm adding one for you, but still nagging you. Collecting package metadata (repodata.json): done Solving environment: failed

ResolvePackageNotFound:

libcxx==12.0.0=h2f01273_0

python==3.10.4=hdfd78df_0

openssl==1.1.1q=hca72f7f_0

ncurses==6.3=hca72f7f_3

readline==8.1.2=hca72f7f_1

bzip2==1.0.8=h1de35cc_0

ca-certificates==2022.07.19=hecd8cb5_0

xz==5.2.5=hca72f7f_1

libffi==3.3=hb1e8313_2

zlib==1.2.12=h4dc903c_2

sqlite==3.38.5=h707629a_0

tk==8.6.12=h5d9f67b_0
opened by Pixelatory 1
May the internal information of gifford data leads to a bias results given by model?

I'm very intersted in your work and analysize the gifford data. Firstly, I use the CD-HIT( a Cluster tool) split into different clusters.Then, I chose the sequence (comes the Clsuter-1(a cluster subset contaiing similar sequences given by CD-HIT)) with highest enrich value as a baseline, and focus on the residue difference between it and others sequences. Very interstingly, i find those sequences that containg 2 or 3 different residues compared to baseline sequence, usually have high enrichments. In Top-100 high enrichments, it can at 65%. As i know， your work is a multitask that both focus on generation and prediction. **I wonder that whether the JT-VAE tends to produce the new sequences that different from the corresponding baseline sequence with highest enrichment about 2 or 3 different residues , and the prediction neural network may think such sequences are good results.**It means that the model only need to realize the fact that compared to high enrich sequnces,the new sequnces contain 2 or 3 different residues is good enough. Beacuse i not find your results, i hope you can give me some advices.

opened by chengyunzhang 0

Releases(v1.0)

v1.0(Jul 31, 2022)

Published version of code and dataset.
Source code(tar.gz)
Source code(zip)

Owner

Krishnaswamy Lab

GitHub Repository

A library for performing coverage guided fuzzing of neural networks

TensorFuzz: Coverage Guided Fuzzing for Neural Networks This repository contains a library for performing coverage guided fuzzing of neural networks,

195 Dec 28, 2022

Package to compute Mauve, a similarity score between neural text and human text. Install with `pip install mauve-text`.

MAUVE MAUVE is a library built on PyTorch and HuggingFace Transformers to measure the gap between neural text and human text with the eponymous MAUVE

182 Jan 02, 2023

Spatial-Temporal Transformer for Dynamic Scene Graph Generation, ICCV2021

Spatial-Temporal Transformer for Dynamic Scene Graph Generation Pytorch Implementation of our paper Spatial-Temporal Transformer for Dynamic Scene Gra

119 Jan 01, 2023

R-Drop: Regularized Dropout for Neural Networks

R-Drop: Regularized Dropout for Neural Networks R-drop is a simple yet very effective regularization method built upon dropout, by minimizing the bidi

756 Dec 27, 2022

FPSAutomaticAiming——基于YOLOV5的FPS类游戏自动瞄准AI

FPSAutomaticAiming——基于YOLOV5的FPS类游戏自动瞄准AI 声明: 本项目仅限于学习交流，不可用于非法用途，包括但不限于：用于游戏外挂等，使用本项目产生的任何后果与本人无关！简介本项目基于yolov5,实现了一款FPS类游戏（CF、CSGO等）的自瞄AI，本项目旨在使用现

246 Dec 28, 2022

CVPR 2021 Challenge on Super-Resolution Space

Learning the Super-Resolution Space Challenge NTIRE 2021 at CVPR Learning the Super-Resolution Space challenge is held as a part of the 6th edition of

104 Oct 26, 2022

CVAT is free, online, interactive video and image annotation tool for computer vision

Computer Vision Annotation Tool (CVAT) CVAT is free, online, interactive video and image annotation tool for computer vision. It is being used by our

8.6k Jan 04, 2023

1st Solution For ICDAR 2021 Competition on Mathematical Formula Detection

This project releases our 1st place solution on ICDAR 2021 Competition on Mathematical Formula Detection. We implement our solution based on MMDetection, which is an open source object detection tool

94 Dec 25, 2022

Advanced Deep Learning with TensorFlow 2 and Keras (Updated for 2nd Edition)

1.5k Jan 03, 2023

Planner_backend - Academic planner application designed for students and counselors.

Planner (backend) Academic planner application designed for students and advisors.

2 Dec 31, 2021

Official implementation of "StyleCariGAN: Caricature Generation via StyleGAN Feature Map Modulation" (SIGGRAPH 2021)

StyleCariGAN: Caricature Generation via StyleGAN Feature Map Modulation This repository contains the official PyTorch implementation of the following

270 Dec 30, 2022

Keras implementations of Generative Adversarial Networks.

This repository has gone stale as I unfortunately do not have the time to maintain it anymore. If you would like to continue the development of it as

8.9k Jan 04, 2023

Official repository for the paper "Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks"

Easy-To-Hard The official repository for the paper "Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks". Gett

52 Sep 08, 2022

PyTorch implementation of Federated Learning with Non-IID Data, and federated learning algorithms, including FedAvg, FedProx.

Federated Learning with Non-IID Data This is an implementation of the following paper: Yue Zhao, Meng Li, Liangzhen Lai, Naveen Suda, Damon Civin, Vik

48 Dec 29, 2022

Improved Fitness Optimization Landscapes for Sequence Design

Related tags

Overview

ReLSO

Description

Citation

How to run

Usage

Training models

available dataset args:

available auxnetwork args:

Original data sources

You might also like...

An implementation of a sequence to sequence neural network using an encoder-decoder

Sequence lineage information extracted from RKI sequence data repo

Official repository of OFA. Paper: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Aircraft design optimization made fast through modern automatic differentiation

Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

library for nonlinear optimization, wrapping many algorithms for global and local, constrained or unconstrained, optimization

Racing line optimization algorithm in python that uses Particle Swarm Optimization.

Puzzle-CAM: Improved localization via matching partial and full features.

[ECCVW2020] Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DiMP)

Comments

Conda env create not working

May the internal information of gifford data leads to a bias results given by model?

Releases(v1.0)

v1.0(Jul 31, 2022)

Owner

Krishnaswamy Lab

A library for performing coverage guided fuzzing of neural networks

Package to compute Mauve, a similarity score between neural text and human text. Install with `pip install mauve-text`.

Spatial-Temporal Transformer for Dynamic Scene Graph Generation, ICCV2021

R-Drop: Regularized Dropout for Neural Networks

FPSAutomaticAiming——基于YOLOV5的FPS类游戏自动瞄准AI

CVPR 2021 Challenge on Super-Resolution Space

CVAT is free, online, interactive video and image annotation tool for computer vision

1st Solution For ICDAR 2021 Competition on Mathematical Formula Detection

Advanced Deep Learning with TensorFlow 2 and Keras (Updated for 2nd Edition)

Planner_backend - Academic planner application designed for students and counselors.

Official implementation of "StyleCariGAN: Caricature Generation via StyleGAN Feature Map Modulation" (SIGGRAPH 2021)

Keras implementations of Generative Adversarial Networks.

Official repository for the paper "Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks"

PyTorch implementation of Federated Learning with Non-IID Data, and federated learning algorithms, including FedAvg, FedProx.

EFENet: Reference-based Video Super-Resolution with Enhanced Flow Estimation

Official implement of "CAT: Cross Attention in Vision Transformer".

Walk with fastai

Adversarial vulnerability of powerful near out-of-distribution detection

Kaggle Feedback Prize - Evaluating Student Writing 15th solution

SWA Object Detection