Kaggle | 9th place (part of) solution for the Bristol-Myers Squibb – Molecular Translation challenge

Overview

Part of the 9th place solution for the Bristol-Myers Squibb – Molecular Translation challenge translating images containing chemical structures into InChI (International Chemical Identifier) texts.

This repo is partially based on the following resources:

Requirements

  • install and activate the conda environment
  • download and extract the data into /data/bms/
  • extract and move sample_submission_with_length.csv.gz into /data/bms/
  • tokenize training inputs: python datasets/prepocess2.py
  • if you want to use pseudo labeling, execute: python datasets/pseudo_prepocess2.py your_submission_file.csv
  • if you want to use external images, you can create with the following commands:
python r09_create_images_from_allowed_inchi.py
python datasets/extra_prepocess2.py 
  • and also install apex

Training

This repo supports training any VIT/SWIN/CAIT transformer models from timm as encoder together with the fairseq transformer decoder.

Here is an example configuration to train a SWIN swin_base_patch4_window12_384 as encoder and 12 layer 16 head fairseq decoder:

python -m torch.distributed.launch --nproc_per_node=N train.py --logdir=logdir/ \
    --pipeline --train-batch-size=50 --valid-batch-size=128 --dataload-workers-nums=10 --mixed-precision --amp-level=O2  \
    --aug-rotate90-p=0.5 --aug-crop-p=0.5 --aug-noise-p=0.9 --label-smoothing=0.1 \
    --encoder-lr=1e-3 --decoder-lr=1e-3 --lr-step-ratio=0.3 --lr-policy=step --optim=adam --lr-warmup-steps=1000 --max-epochs=20 --weight-decay=0 --clip-grad-norm=1 \
    --verbose --image-size=384 --model=swin_base_patch4_window12_384 --loss=ce --embed-dim=1024 --num-head=16 --num-layer=12 \
    --fold=0 --train-dataset-size=0 --valid-dataset-size=65536 --valid-dataset-non-sorted

For pseudo labeling, use --pseudo=pseudo.pkl. If you want subsample the pseudo dataset, use: --pseudo-dataset-size=448000. For using external images, use --extra (--extra-dataset-size=448000).

After training, you can also use Stochastic Weight Averaging (SWA) which gives a boost around 0.02:

python swa.py --image-size=384 --input logdir/epoch-17.pth,logdir/epoch-18.pth,logdir/epoch-19.pth,logdir/epoch-20.pth

Inference

Evaluation:

python -m torch.distributed.launch --nproc_per_node=N eval.py --mixed-precision --batch-size=128 swa_model.pth

Inference:

python -m torch.distributed.launch --nproc_per_node=N inference.py --mixed-precision --batch-size=128 swa_model.pth

Normalization with RDKit:

./normalize_inchis.sh submission.csv
Owner
Erdene-Ochir Tuguldur
Берлиний Техникийн Их Сургууль
Erdene-Ochir Tuguldur
Unicorn can be used for performance analyses of highly configurable systems with causal reasoning

Unicorn can be used for performance analyses of highly configurable systems with causal reasoning. Users or developers can query Unicorn for a performance task.

AISys Lab 27 Jan 05, 2023
Code for testing various M1 Chip benchmarks with TensorFlow.

M1, M1 Pro, M1 Max Machine Learning Speed Test Comparison This repo contains some sample code to benchmark the new M1 MacBooks (M1 Pro and M1 Max) aga

Daniel Bourke 348 Jan 04, 2023
This is an open source python repository for various python tests

Welcome to Py-tests This is an open source python repository for various python tests. This is in response to the hacktoberfest2021 challenge. It is a

Yada Martins Tisan 3 Oct 31, 2021
pytorch implementation of dftd2 & dftd3

torch-dftd pytorch implementation of dftd2 [1] & dftd3 [2, 3] Install # Install from pypi pip install torch-dftd # Install from source (for developer

33 Nov 28, 2022
Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal, multi-exposure and multi-focus image fusion.

U2Fusion Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal (VIS-IR, medical), multi

Han Xu 129 Dec 11, 2022
AWS provides a Python SDK, "Boto3" ,which can be used to access the AWS-account from the local.

Boto3 - The AWS SDK for Python Boto3 is the Amazon Web Services (AWS) Software Development Kit (SDK) for Python, which allows Python developers to wri

Shreyas Srivastava 1 Oct 25, 2021
TensorFlow Similarity is a python package focused on making similarity learning quick and easy.

TensorFlow Similarity is a python package focused on making similarity learning quick and easy.

912 Jan 08, 2023
Simple implementation of Mobile-Former on Pytorch

Simple-implementation-of-Mobile-Former At present, only the model but no trained. There may be some bug in the code, and some details may be different

Acheung 103 Dec 31, 2022
Drone detection using YOLOv5

This drone detection system uses YOLOv5 which is a family of object detection architectures and we have trained the model on Drone Dataset. Overview I

Tushar Sarkar 27 Dec 20, 2022
Koç University deep learning framework.

Knet Knet (pronounced "kay-net") is the Koç University deep learning framework implemented in Julia by Deniz Yuret and collaborators. It supports GPU

1.4k Dec 31, 2022
Code accompanying "Learning What To Do by Simulating the Past", ICLR 2021.

Learning What To Do by Simulating the Past This repository contains code that implements the Deep Reward Learning by Simulating the Past (Deep RSLP) a

Center for Human-Compatible AI 24 Aug 07, 2021
This is the code for the paper "Motion-Focused Contrastive Learning of Video Representations" (ICCV'21).

Motion-Focused Contrastive Learning of Video Representations Introduction This is the code for the paper "Motion-Focused Contrastive Learning of Video

11 Sep 23, 2022
Using Random Effects to Account for High-Cardinality Categorical Features and Repeated Measures in Deep Neural Networks

LMMNN Using Random Effects to Account for High-Cardinality Categorical Features and Repeated Measures in Deep Neural Networks This is the working dire

Giora Simchoni 10 Nov 02, 2022
2021-AIAC-QQ-Browser-Hyperparameter-Optimization-Rank6

2021-AIAC-QQ-Browser-Hyperparameter-Optimization-Rank6

Aigege 8 Mar 31, 2022
pytorch implementation for Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network arXiv:1609.04802

PyTorch SRResNet Implementation of Paper: "Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network"(https://arxiv.org/abs

Jiu XU 436 Jan 09, 2023
This repository collects 100 papers related to negative sampling methods.

Negative-Sampling-Paper This repository collects 100 papers related to negative sampling methods, covering multiple research fields such as Recommenda

RUCAIBox 119 Dec 29, 2022
This is the latest version of the PULP SDK

PULP-SDK This is the latest version of the PULP SDK, which is under active development. The previous (now legacy) version, which is no longer supporte

78 Dec 07, 2022
Official repository for "Restormer: Efficient Transformer for High-Resolution Image Restoration". SOTA for motion deblurring, image deraining, denoising (Gaussian/real data), and defocus deblurring.

Restormer: Efficient Transformer for High-Resolution Image Restoration Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan,

Syed Waqas Zamir 906 Dec 30, 2022
This is the official code release for the paper Shape and Material Capture at Home

This is the official code release for the paper Shape and Material Capture at Home. The code enables you to reconstruct a 3D mesh and Cook-Torrance BRDF from one or more images captured with a flashl

89 Dec 10, 2022
PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

DiffGAN-TTS - PyTorch Implementation PyTorch implementation of DiffGAN-TTS: High

Keon Lee 157 Jan 01, 2023