A whale detector design for the Kaggle whale-detector challenge!

Overview

CNN (InceptionV1) + STFT based Whale Detection Algorithm

So, this repository is my PyTorch solution for the Kaggle whale-detection challenge. The objective of this challenge was to basically do a binary classification, (hence really a detection), on the existance of whale signals in the water.

It's a pretty cool problem that resonates with prior work I have done in underwater perception algorithm design - a freakishly hard problem I may add. (The speed of sound changes on you, multiple reflections from the environment, but probably the hardest of all being that it's hard to gather ground-truth). (<--- startup idea? 💥 )

Anyway! My approach is to first transform the 1D acoustic time-domain signal into a 2D time-frequency representation via the Short-Time-Fourier-Transform (STFT). We do this in the following way:

(Where K_F is the raw number of STFT frequency bands, n is the discrete time index, m is the temporal index of each STFT pixel, x[n] the raw audio signal being transformed, and k representing the index of each STFT pixel's frequency). In this way, we break the signal down into it's constituent time-frequency energy cells, (which are now pixels), but more crucially, we get a representation that has distinct features across time and frequency that will be correlated with each other. This then makes it ripe for a Convolutional Neural Network (CNN) to chew into.

Here is what a whale-signal's STFT looks like:

Pos whale spectrogram

Similarly, here's what a signal's STFT looks like without any whale signal. (Instead, there seems to be some short-time but uber wide band interference at some point in time).

Neg whale spectrogram

It's actually interesting, because there are basically so many more ways in which a signal can manifest itself as not a whale signal, VS as actually being a whale signal. Does that mean we can also frame the problem as learning the manifold of whale-signals and simply do outlier analysis on that? Something to think about. :)

Code Usage:

Ok - let us now talk about how to use the code:

The first thing you need to do is install PyTorch of course. Do this from here. I use a conda environment as they recommend, and I recommend you do the same.

Once this is done, activate your PyTorch environment.

Now we need to download the raw data. You can get that from Kaggle's site here. Unzip this data at a directory of your choosing. For the purpose of this tutorial, I am going to assume that you placed and unzipped the data as such: /Users/you/data/whaleData/. (We will only be using the training data so that we can split it into train/val/test. The reason is that we do not have access to Kaggle's test labels).

We are now going to do the following steps:

  • Convert the audio files into numpy STFT tensors:
    • python whaleDataCreatorToNumpy.py -s 1 -dataDir /Users/you/data/whaleData/train/ -labelcsv /Users/you/data/whaleData/train.csv -dataDirProcessed /Users/you/data/whaleData/processedData/ -ds 0.42 -rk 20 200
    • The -s 1 flag says we want to save the results, the -ds 0.42 says we want to downsample the STFT image by this amount, (to help with computation time), and the -rk 20 200 says that we want the "rows kept" to be indexed from 20 to 200. This is because the STFT is conjugate symmetric, but also because we make a determination by first swimming in the data, (I swear this pun is not intentional), that most of the informational content lies between those bands. (Again, the motivation is computational here as well).
  • Convert and split the STFT tensors into PyTorch training/val/test Torch tensors:
    • python whaleDataCreatorNumpyToTorchTensors.py -numpyDataDir /Users/you/data/whaleData/processedData/
    • Here, the original numpy tensors are first split and normalized, and then saved off into PyTorch tensors. (The split percentages are able to be user defined, I set the defaults set 20% for validation and 10% test). The PyTorch tensors are saved in the same directory as above.
  • Run the CNN classifier!
    • We are now ready to train the classifier! I have already designed an Inception-V1 CNN architecture, that can be loaded up automatically, and we can use this as so. The input dimensions are also guaranteed to be equal to the STFT image sizes here. At any rate, we do this like so:
    • python whaleClassifier.py -dataDirProcessed /Users/you/data/whaleData/processedData/ -g 0 -e 1 -lr 0.0002 -L2 0.01 -mb 4 -dp 0 -s 3 -dnn 'inceptionModuleV1_75x45'
    • The g term controls whether or not we want to use a GPU to trian, e controls the number of epochs we want to train over, lr is the learning rate, L2 is the L2 penalization amount for regularization, mb is the minibatch size, (which will be double this as the training composes a mini-batch to have an equal number of positive and negative samples), dp controls data parallelism (moot without multiple GPUs, and is really just a flag on whether or not to use multiple GPUs), s controls when and how often we save the net weights and validation losses, (option 3 saves the best performing model), and finally, -dnn is a flag that controls which DNN architecture we want to use. In this way, you can write your own DNN arch, and then simply call it by whatever name you give it for actual use. (I did this after I got tired of hard-coding every single DNN I designed).
    • If everything is running smoothly, you should see something like this as training progresses:
    • The "time" here just shows how long it takes between the reporting of each validation score. (Since I ran this on my CPU, it's 30 seconds / report, but expect this to be at least an order of magnitude faster on a respectable GPU).
  • Evauluate the results!
    • When your training is complete, you can then then run this script to give you automatically generated ROC and PR curves for your network's performance:
    • python resultsVisualization.py -dataDirProcessed /Users/you/data/whaleData/processedData/ -netDir .
    • After a good training session, you should get results that look like so:
    • I also show the normalized training / validation likelihoods and accuracies for the duration of the session:

So wow! An AUC of 0.9669! Not too shabby! Can still be improved, but considering the data looks like this below, our InceptionV1-CNN isn't doing too bad either. 💥

Owner
Tarin Ziyaee
Eng Manager @Facebook FRL neural interfaces | Director R&D @CTRL-labs neural inferfaces. | CTO @Voyage, autonomous vehicles | Perception @Apple Autonomous
Tarin Ziyaee
Python framework for Stochastic Differential Equations modeling

SDElearn: a Python package for SDE modeling This package implements functionalities for working with Stochastic Differential Equations models (SDEs fo

4 May 10, 2022
Ascend your Jupyter Notebook usage

Jupyter Ascending Sync Jupyter Notebooks from any editor About Jupyter Ascending lets you edit Jupyter notebooks from your favorite editor, then insta

Untitled AI 254 Jan 08, 2023
Agile SVG maker for python

Agile SVG Maker Need to draw hundreds of frames for a GIF? Need to change the style of all pictures in a PPT? Need to draw similar images with differe

SemiWaker 4 Sep 25, 2022
PyTorch implementation of "PatchGame: Learning to Signal Mid-level Patches in Referential Games" to appear in NeurIPS 2021

PatchGame: Learning to Signal Mid-level Patches in Referential Games This repository is the official implementation of the paper - "PatchGame: Learnin

Kamal Gupta 22 Mar 16, 2022
Official implementation of NeurIPS 2021 paper "Contextual Similarity Aggregation with Self-attention for Visual Re-ranking"

CSA: Contextual Similarity Aggregation with Self-attention for Visual Re-ranking PyTorch training code for CSA (Contextual Similarity Aggregation). We

Hui Wu 19 Oct 21, 2022
TorchX is a library containing standard DSLs for authoring and running PyTorch related components for an E2E production ML pipeline.

TorchX is a library containing standard DSLs for authoring and running PyTorch related components for an E2E production ML pipeline

193 Dec 22, 2022
The official homepage of the (outdated) COCO-Stuff 10K dataset.

COCO-Stuff 10K dataset v1.1 (outdated) Holger Caesar, Jasper Uijlings, Vittorio Ferrari Overview Welcome to official homepage of the COCO-Stuff [1] da

Holger Caesar 263 Dec 11, 2022
Code for our paper "MG-GAN: A Multi-Generator Model Preventing Out-of-Distribution Samples in Pedestrian Trajectory Prediction" published at ICCV 2021.

MG-GAN: A Multi-Generator Model Preventing Out-of-Distribution Samples in Pedestrian Trajectory Prediction This repository contains the code for the p

Sven 30 Jan 05, 2023
Opinionated code formatter, just like Python's black code formatter but for Beancount

beancount-black Opinionated code formatter, just like Python's black code formatter but for Beancount Try it out online here Features MIT licensed - b

Launch Platform 16 Oct 11, 2022
Implementation of: "Exploring Randomly Wired Neural Networks for Image Recognition"

RandWireNN Unofficial PyTorch Implementation of: Exploring Randomly Wired Neural Networks for Image Recognition. Results Validation result on Imagenet

Seung-won Park 684 Nov 02, 2022
eXPeditious Data Transfer

xpdt: eXPeditious Data Transfer About xpdt is (yet another) language for defining data-types and generating code for serializing and deserializing the

Gianni Tedesco 3 Jan 06, 2022
A basic reminder tool written in Python.

A simple Python Reminder Here's a basic reminder tool written in Python that speaks to the user and sends a notification. Run pip3 install pyttsx3 w

Sachit Yadav 4 Feb 05, 2022
Official repository for the paper "Self-Supervised Models are Continual Learners" (CVPR 2022)

Self-Supervised Models are Continual Learners This is the official repository for the paper: Self-Supervised Models are Continual Learners Enrico Fini

Enrico Fini 73 Dec 18, 2022
Script utilizando OpenCV e modelo Machine Learning para detectar o uso de máscaras.

Reconhecendo máscaras Este repositório contém um script em Python3 que reconhece se um rosto está ou não portando uma máscara! O código utiliza da bib

Maria Eduarda de Azevedo Silva 168 Oct 20, 2022
(CVPR 2021) Lifting 2D StyleGAN for 3D-Aware Face Generation

Lifting 2D StyleGAN for 3D-Aware Face Generation Official implementation of paper "Lifting 2D StyleGAN for 3D-Aware Face Generation". Requirements You

Yichun Shi 66 Nov 29, 2022
[ICCV'2021] "SSH: A Self-Supervised Framework for Image Harmonization", Yifan Jiang, He Zhang, Jianming Zhang, Yilin Wang, Zhe Lin, Kalyan Sunkavalli, Simon Chen, Sohrab Amirghodsi, Sarah Kong, Zhangyang Wang

SSH: A Self-Supervised Framework for Image Harmonization (ICCV 2021) code for SSH Representative Examples Main Pipeline RealHM DataSet Google Drive Pr

VITA 86 Dec 02, 2022
Automatically erase objects in the video, such as logo, text, etc.

Video-Auto-Wipe Read English Introduction:Here   本人不定期的基于生成技术制作一些好玩有趣的算法模型,这次带来的作品是“视频擦除”方向的应用模型,它实现的功能是自动感知到视频中我们不想看见的部分(譬如广告、水印、字幕、图标等等)然后进行擦除。由于图标擦

seeprettyface.com 141 Dec 26, 2022
EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

MADE (Multi-Adapter Dataset Experts) This repository contains the implementation of MADE (Multi-adapter dataset experts), which is described in the pa

Princeton Natural Language Processing 68 Jul 18, 2022
Pip-package for trajectory benchmarking from "Be your own Benchmark: No-Reference Trajectory Metric on Registered Point Clouds", ECMR'21

Map Metrics for Trajectory Quality Map metrics toolkit provides a set of metrics to quantitatively evaluate trajectory quality via estimating consiste

Mobile Robotics Lab. at Skoltech 31 Oct 28, 2022
Le dataset des images du projet d'IA de 2021

face-mask-dataset-ilc-2021 Le dataset des images du projet d'IA de 2021, Indiquez vos id git dans la issue pour les droits TL;DR: Choisir 200 images J

7 Nov 15, 2021