STFT_Transformer

Code for STFT Transformer used in BirdCLEF 2021 competition.

The STFT Transformer is a new way to use Transformers similar to Vision Transformers on audio data. It has been developed for the BirdCLEF 2021 competition hosted on Kaggle. The pdf document gives more context. It has been submitted to the BIRDCLEF 2021 workshop.

The code is provided as is, it has not been rewritten. Given competitions are done in a hurry, code may not meet usual open source standard.

The code assumes this directory structure:

<base_dir>/code

<base_dir>/input

<base_dir>/input/freefield1010

<base_dir>/checkpoints

<base_dir>/data

Code has to be run in the code directory. Competition data has to be downloaded in the input directory. freefield1010 data must also be downloaded in the freefield1010 directory. data_final.py should be run first. It reads audio files from input and stores the relevant part in data directory as numpy files.

Then stft_transformer_final.py can be run to train one fold model. During the competition I ran 5 folds, by editing the FOLD global variable in the script (I know, this is sub standard).

Once all 5 models are trained one can upload the weights to a kaggle dataset and use the submission notebook I used. This should get a score worth the 15th rank in the competition. Achieving this rank with a single model is significant, as all top teams used an ensemble of models.

Code for STFT Transformer used in BirdCLEF 2021 competition.

Related tags

Overview

STFT_Transformer

Owner

Jean-François Puget

A stable algorithm for GAN training

code for Multi-scale Matching Networks for Semantic Correspondence, ICCV

Apply AnimeGAN-v2 across frames of a video clip

Pytorch implementation of MalConv

This repository contains the PyTorch implementation of the paper STaCK: Sentence Ordering with Temporal Commonsense Knowledge appearing at EMNLP 2021.

A different spin on dataclasses.

Classification of ecg datas for disease detection

This repo contains the code for paper Inverse Weighted Survival Games

PyTorch Implementation of Temporal Output Discrepancy for Active Learning, ICCV 2021

Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

FMA: A Dataset For Music Analysis

The pure and clear PyTorch Distributed Training Framework.

Self-Supervised Pillar Motion Learning for Autonomous Driving (CVPR 2021)

The implementation of our CIKM 2021 paper titled as: "Cross-Market Product Recommendation"

Franka Emika Panda manipulator kinematics&dynamics simulation

An official reimplementation of the method described in the INTERSPEECH 2021 paper - Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.

Building Ellee — A GPT-3 and Computer Vision Powered Talking Robotic Teddy Bear With Human Level Conversation Intelligence

The MLOps platform for innovators 🚀

Repo for flood prediction using LSTMs and HAND

Open AI's Python library