A whale detector design for the Kaggle whale-detector challenge!

Overview

CNN (InceptionV1) + STFT based Whale Detection Algorithm

So, this repository is my PyTorch solution for the Kaggle whale-detection challenge. The objective of this challenge was to basically do a binary classification, (hence really a detection), on the existance of whale signals in the water.

It's a pretty cool problem that resonates with prior work I have done in underwater perception algorithm design - a freakishly hard problem I may add. (The speed of sound changes on you, multiple reflections from the environment, but probably the hardest of all being that it's hard to gather ground-truth). (<--- startup idea? 💥 )

Anyway! My approach is to first transform the 1D acoustic time-domain signal into a 2D time-frequency representation via the Short-Time-Fourier-Transform (STFT). We do this in the following way:

(Where K_F is the raw number of STFT frequency bands, n is the discrete time index, m is the temporal index of each STFT pixel, x[n] the raw audio signal being transformed, and k representing the index of each STFT pixel's frequency). In this way, we break the signal down into it's constituent time-frequency energy cells, (which are now pixels), but more crucially, we get a representation that has distinct features across time and frequency that will be correlated with each other. This then makes it ripe for a Convolutional Neural Network (CNN) to chew into.

Here is what a whale-signal's STFT looks like:

Pos whale spectrogram

Similarly, here's what a signal's STFT looks like without any whale signal. (Instead, there seems to be some short-time but uber wide band interference at some point in time).

Neg whale spectrogram

It's actually interesting, because there are basically so many more ways in which a signal can manifest itself as not a whale signal, VS as actually being a whale signal. Does that mean we can also frame the problem as learning the manifold of whale-signals and simply do outlier analysis on that? Something to think about. :)

Code Usage:

Ok - let us now talk about how to use the code:

The first thing you need to do is install PyTorch of course. Do this from here. I use a conda environment as they recommend, and I recommend you do the same.

Once this is done, activate your PyTorch environment.

Now we need to download the raw data. You can get that from Kaggle's site here. Unzip this data at a directory of your choosing. For the purpose of this tutorial, I am going to assume that you placed and unzipped the data as such: /Users/you/data/whaleData/. (We will only be using the training data so that we can split it into train/val/test. The reason is that we do not have access to Kaggle's test labels).

We are now going to do the following steps:

  • Convert the audio files into numpy STFT tensors:
    • python whaleDataCreatorToNumpy.py -s 1 -dataDir /Users/you/data/whaleData/train/ -labelcsv /Users/you/data/whaleData/train.csv -dataDirProcessed /Users/you/data/whaleData/processedData/ -ds 0.42 -rk 20 200
    • The -s 1 flag says we want to save the results, the -ds 0.42 says we want to downsample the STFT image by this amount, (to help with computation time), and the -rk 20 200 says that we want the "rows kept" to be indexed from 20 to 200. This is because the STFT is conjugate symmetric, but also because we make a determination by first swimming in the data, (I swear this pun is not intentional), that most of the informational content lies between those bands. (Again, the motivation is computational here as well).
  • Convert and split the STFT tensors into PyTorch training/val/test Torch tensors:
    • python whaleDataCreatorNumpyToTorchTensors.py -numpyDataDir /Users/you/data/whaleData/processedData/
    • Here, the original numpy tensors are first split and normalized, and then saved off into PyTorch tensors. (The split percentages are able to be user defined, I set the defaults set 20% for validation and 10% test). The PyTorch tensors are saved in the same directory as above.
  • Run the CNN classifier!
    • We are now ready to train the classifier! I have already designed an Inception-V1 CNN architecture, that can be loaded up automatically, and we can use this as so. The input dimensions are also guaranteed to be equal to the STFT image sizes here. At any rate, we do this like so:
    • python whaleClassifier.py -dataDirProcessed /Users/you/data/whaleData/processedData/ -g 0 -e 1 -lr 0.0002 -L2 0.01 -mb 4 -dp 0 -s 3 -dnn 'inceptionModuleV1_75x45'
    • The g term controls whether or not we want to use a GPU to trian, e controls the number of epochs we want to train over, lr is the learning rate, L2 is the L2 penalization amount for regularization, mb is the minibatch size, (which will be double this as the training composes a mini-batch to have an equal number of positive and negative samples), dp controls data parallelism (moot without multiple GPUs, and is really just a flag on whether or not to use multiple GPUs), s controls when and how often we save the net weights and validation losses, (option 3 saves the best performing model), and finally, -dnn is a flag that controls which DNN architecture we want to use. In this way, you can write your own DNN arch, and then simply call it by whatever name you give it for actual use. (I did this after I got tired of hard-coding every single DNN I designed).
    • If everything is running smoothly, you should see something like this as training progresses:
    • The "time" here just shows how long it takes between the reporting of each validation score. (Since I ran this on my CPU, it's 30 seconds / report, but expect this to be at least an order of magnitude faster on a respectable GPU).
  • Evauluate the results!
    • When your training is complete, you can then then run this script to give you automatically generated ROC and PR curves for your network's performance:
    • python resultsVisualization.py -dataDirProcessed /Users/you/data/whaleData/processedData/ -netDir .
    • After a good training session, you should get results that look like so:
    • I also show the normalized training / validation likelihoods and accuracies for the duration of the session:

So wow! An AUC of 0.9669! Not too shabby! Can still be improved, but considering the data looks like this below, our InceptionV1-CNN isn't doing too bad either. 💥

Owner
Tarin Ziyaee
Eng Manager @Facebook FRL neural interfaces | Director R&D @CTRL-labs neural inferfaces. | CTO @Voyage, autonomous vehicles | Perception @Apple Autonomous
Tarin Ziyaee
Like Dirt-Samples, but cleaned up

Clean-Samples Like Dirt-Samples, but cleaned up, with clear provenance and license info (generally a permissive creative commons licence but check the

TidalCycles 39 Nov 30, 2022
Data and extra materials for the food safety publications classifier

Data and extra materials for the food safety publications classifier The subdirectories contain detailed descriptions of their contents in the README.

1 Jan 20, 2022
EgGateWayGetShell py脚本

EgGateWayGetShell_py 免责声明 由于传播、利用此文所提供的信息而造成的任何直接或者间接的后果及损失,均由使用者本人负责,作者不为此承担任何责任。 使用 python3 eg.py urls.txt 目标 title:锐捷网络-EWEB网管系统 port:4430 漏洞成因 ?p

榆木 61 Nov 09, 2022
Python Jupyter kernel using Poetry for reproducible notebooks

Poetry Kernel Use per-directory Poetry environments to run Jupyter kernels. No need to install a Jupyter kernel per Python virtual environment! The id

Pathbird 204 Jan 04, 2023
pytorch implementation of openpose including Hand and Body Pose Estimation.

pytorch-openpose pytorch implementation of openpose including Body and Hand Pose Estimation, and the pytorch model is directly converted from openpose

Hzzone 1.4k Jan 07, 2023
Segmentation vgg16 fcn - cityscapes

VGGSegmentation Segmentation vgg16 fcn - cityscapes Priprema skupa skripta prepare_dataset_downsampled.py Iz slika cityscapesa izrezuje haubu automobi

6 Oct 24, 2020
RefineGNN - Iterative refinement graph neural network for antibody sequence-structure co-design (RefineGNN)

Iterative refinement graph neural network for antibody sequence-structure co-des

Wengong Jin 83 Dec 31, 2022
Code for the Higgs Boson Machine Learning Challenge organised by CERN & EPFL

A method to solve the Higgs boson challenge using Least Squares - Novae This project is the Project 1 of EPFL CS-433 Machine Learning. The project is

Giacomo Orsi 1 Nov 09, 2021
Download files from DSpace systems (because for some reason DSpace won't let you)

DSpaceDL A tool for downloading files from DSpace items. For some reason, DSpace systems have a dogshit UI, and Universities absolutely LOOOVE to use

Soumitra Shewale 5 Dec 01, 2022
Determined: Deep Learning Training Platform

Determined: Deep Learning Training Platform Determined is an open-source deep learning training platform that makes building models fast and easy. Det

Determined AI 2k Dec 31, 2022
Educational 2D SLAM implementation based on ICP and Pose Graph

slam-playground Educational 2D SLAM implementation based on ICP and Pose Graph How to use: Use keyboard arrow keys to navigate robot. Press 'r' to vie

Kirill 19 Dec 17, 2022
Playable Video Generation

Playable Video Generation Playable Video Generation Willi Menapace, Stéphane Lathuilière, Sergey Tulyakov, Aliaksandr Siarohin, Elisa Ricci Paper: ArX

Willi Menapace 136 Dec 31, 2022
This is an example implementation of the paper "Cross Domain Robot Imitation with Invariant Representation".

IR-GAIL This is an example implementation of the paper "Cross Domain Robot Imitation with Invariant Representation". Dependency The experiments are de

Zhao-Heng Yin 1 Jul 14, 2022
A spatial genome aligner for analyzing multiplexed DNA-FISH imaging data.

jie jie is a spatial genome aligner. This package parses true chromatin imaging signal from noise by aligning signals to a reference DNA polymer model

Bojing Jia 9 Sep 29, 2022
Python implementation of Project Fluent

Project Fluent This is a collection of Python packages to use the Fluent localization system. python-fluent consists of these packages: fluent.syntax

Project Fluent 155 Dec 28, 2022
Use graph-based analysis to re-classify stocks and to improve Markowitz portfolio optimization

Dynamic Stock Industrial Classification Use graph-based analysis to re-classify stocks and experiment different re-classification methodologies to imp

Sheng Yang 10 Dec 05, 2022
A Kernel fuzzer focusing on race bugs

Razzer: Finding kernel race bugs through fuzzing Environment setup $ source scripts/envsetup.sh scripts/envsetup.sh sets up necessary environment var

Systems and Software Security Lab at Seoul National University (SNU) 328 Dec 26, 2022
codes for paper Combining Dynamic Local Context Focus and Dependency Cluster Attention for Aspect-level sentiment classification

DLCF-DCA codes for paper Combining Dynamic Local Context Focus and Dependency Cluster Attention for Aspect-level sentiment classification. submitted t

15 Aug 30, 2022
Code for the paper BERT might be Overkill: A Tiny but Effective Biomedical Entity Linker based on Residual Convolutional Neural Networks

Biomedical Entity Linking This repo provides the code for the paper BERT might be Overkill: A Tiny but Effective Biomedical Entity Linker based on Res

Tuan Manh Lai 24 Oct 24, 2022
ICON: Implicit Clothed humans Obtained from Normals (CVPR 2022)

ICON: Implicit Clothed humans Obtained from Normals Yuliang Xiu · Jinlong Yang · Dimitrios Tzionas · Michael J. Black CVPR 2022 News 🚩 [2022/04/26] H

Yuliang Xiu 1.1k Jan 04, 2023