PaSST: Efficient Training of Audio Transformers with Patchout

Last update: Dec 26, 2022

Related tags

Overview

PaSST: Efficient Training of Audio Transformers with Patchout

This is the implementation for Efficient Training of Audio Transformers with Patchout

Patchout significantly reduces the training time and GPU memory requirements to train transformers on audio spectrograms, while improving their performance.

Patchout works by dropping out some of the input patches during training. In either a unstructured way (randomly, similar to dropout), or entire time-frames or frequency bins of the extracted patches (similar to SpecAugment), which corresponds to rows/columns in step 3 of the figure below.

Setting up the experiments environment

This repo uses forked versions of sacred for configuration and logging, and pytorch-lightning for training.

For setting up Mamba is recommended and faster then conda:

conda install mamba -n base -c conda-forge

Now you can import the environment from environment.yml

mamba env create -f environment.yml

Now you have an environment named ba3l. Now install the forked versions of sacred and pl-lightning and ba3l.

# dependencies
conda activate ba3l
pip install https://github.com/kkoutini/sacred/archive/ba3l.zip
pip install https://github.com/kkoutini/pytorch-lightning/archive/ba3l.zip
pip install https://github.com/kkoutini/ba3l/archive/master.zip

In order to check the environment we used in our runs, please check the environment.yml and pip_list.txt files. Which were exported using:

environment.yml pip list > pip_list.txt ">

conda env export --no-builds | grep -v "prefix" > environment.yml
pip list > pip_list.txt

Training on Audioset

Download and prepare the dataset as explained in the audioset page The base PaSST model can be trained for example like this:

python ex_audioset.py with trainer.precision=16  models.net.arch=passt_deit_bd_p16_384 -p -m mongodb_server:27000:audioset21_balanced -c "PaSST base"

you can override any of the configuration using the sacred syntax. In order to see the available options either use omniboard or use:

 python ex_audioset.py print_config

In short:

All the configuration options under trainer are pytorch lightning trainer api.
models.net are the passt options.
models.mel are the preprocessing options.

For example using only unstructured patchout of 400:

python ex_audioset.py with trainer.precision=16  models.net.arch=passt_deit_bd_p16_384  models.net.u_patchout=400  models.net.s_patchout_f=0 models.net.s_patchout_t=0 -p -m mongodb_server:27000:audioset21_balanced -c "Unstructured PaSST base"

Multi-gpu training can be enabled by setting the environment variable DDP, for example with 2 gpus:

 DDP=2 python ex_audioset.py with trainer.precision=16  models.net.arch=passt_deit_bd_p16_384 -p -m mongodb_server:27000:audioset21_balanced -c "PaSST base 2 GPU"

Pre-trained models

Please check the releases page, to download pre-trained models. In general, you can get a pretrained model on Audioset using

from models.passt import get_model
model  = get_model(arch="passt_s_swa_p16_128_ap476", pretrained=True, n_classes=527, in_channels=1,
                   fstride=10, tstride=10,input_fdim=128, input_tdim=998,
                   u_patchout=0, s_patchout_t=40, s_patchout_f=4)

this will get automatically download pretrained PaSST on audioset with with mAP of 0.476. the model was trained with s_patchout_t=40, s_patchout_f=4 but you can change these to better fit your task/ computational needs.

Contact

The repo will be updated, in the mean time if you have any questions or problems feel free to open an issue on GitHub, or contact the authors directly.

Comments

FSD50K - validating on eval data
Hi! First off, excellent work with the module. It's showing great results so far in my project. I'm having trouble, however, with an experiment. I am trying to fine-tune and train the model on subsets (3k samples for training and validating) and have created hdf5 files for that. The paths in config.basedatasets are corrected for this.

The problem that I run into is that when I run the command: python ex_fsd50k.py evaluate_only with passt_s_swa_p16_s16_128_ap473 the program uses the evaluation data for validation. I confirmed this by making a change in fsd50k/dataset.py:

def __len__(self): if self.hdf5_file == "audioset_hdf5s/mp3/FSD50K.eval_mp3.hdf": return 300 return self.length

which affects the number of validation batches.

I really don't understand what is going on. Isn't the model supposed to validate on the validation data?

Kindest regards, Ludvig.
opened by Ludvig-Joborn 5
No module named 'ba3l.ingredients'

hi, i want to train the PaSST with Audioset But when i runed "ex_audioset.py", i faced error: "No module named 'ba3l.ingredients" I already finished setting up the environment as follow the Readme how can i fix it

opened by kimsojeong1225 5
RuntimeError: The size of tensor a (2055) must match the size of tensor b (99) at non-singleton dimension 3

I use a trained model for inference and I encounter this problem when the file length is long. Traceback (most recent call last): File "", line 1, in File "/home/xingyum/anaconda3/envs/ba3l/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/home/xingyum/models/PaSST/output/openmic2008/_None/checkpoints/src/hear21passt/hear21passt/wrapper.py", line 38, in forward x, features = self.net(specs) File "/home/xingyum/anaconda3/envs/ba3l/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/home/xingyum/models/PaSST/output/openmic2008/_None/checkpoints/src/hear21passt/hear21passt/models/passt.py", line 507, in forward x = self.forward_features(x) File "/home/xingyum/models/PaSST/output/openmic2008/_None/checkpoints/src/hear21passt/hear21passt/models/passt.py", line 454, in forward_features x = x + time_new_pos_embed RuntimeError: The size of tensor a (2055) must match the size of tensor b (99) at non-singleton dimension 3

opened by 980202006 3
Changing tdim for pretrained model

Thanks for sharing such great work! I want to use the pre-trained model but changing input_tdim is giving an error. My audio clips are relatively small and hence i need a smaller input_tdim. How do I do that? The error I get is due to the pretrained layer's size not equal to the current size of the model(After using input_tdim)

opened by ranjith1604 3
Is it possible to install the passt with python=3.6?

Hi, thanks so much for sharing the great work! I'd like to use PaSST for downstream tasks and integrate it into existing conda environment with python=3.6 (it 's kind of painful to upgrade python from 3.6 to 3.7/3.8 due to many inconsistent packages). I know that python>=3.7 is required to install PaSST, but I'm wandering if it's possible to install it with python=3.6?

opened by Alibabade 2
Inference ESC-50 fine-tuned model

Hello, authors. Thank you for sharing the great work.

I tried to fine-tuned AudioSet pretrained model passt-s-f128-p16-s10-ap.476-swa.pt on ESC-50 dataset by using ex_esc50.py. I got checkpoints saved in output/esc50/_None/checkpoints/epoch=4-step=2669.ckpt. I want to load the checkpoint and inference with audio file. I am trying to load the checkpoint model and tried to used passt_hear21 for inference but kinda lost track of the process.

Could you please share how to inference with the saved checkpoints on audio file?

opened by myatmyintzuthin 2

Could not solve for environment specs

I clone the repo. As per the README:

conda install mamba -n base -c conda-forge

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /opt/miniconda3

  added / updated specs:
    - mamba


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    conda-22.11.1              |   py39h2804cbe_1         873 KB  conda-forge
    fmt-9.1.0                  |       hffc8910_0         171 KB  conda-forge
    krb5-1.20.1                |       h127bd45_0         1.0 MB  conda-forge
    libarchive-3.5.2           |       h69ec738_3         1.5 MB  conda-forge
    libcurl-7.87.0             |       hbe9bab4_0         304 KB  conda-forge
    libedit-3.1.20191231       |       hc8eb9b7_2          94 KB  conda-forge
    libev-4.33                 |       h642e427_1          98 KB  conda-forge
    libmamba-1.1.0             |       h1254013_2         1.0 MB  conda-forge
    libmambapy-1.1.0           |   py39h8f82c16_2         214 KB  conda-forge
    libnghttp2-1.47.0          |       h232270b_1         816 KB  conda-forge
    libsolv-0.7.23             |       hb5ab8b9_0         373 KB  conda-forge
    libssh2-1.10.0             |       hb80f160_3         218 KB  conda-forge
    libxml2-2.9.14             |       h9d8dfc2_4         656 KB  conda-forge
    lz4-c-1.9.3                |       hbdafb3b_1         147 KB  conda-forge
    lzo-2.10                   |    h642e427_1000         154 KB  conda-forge
    mamba-1.1.0                |   py39hde45b87_2          48 KB  conda-forge
    openssl-1.1.1s             |       h03a7124_1         1.5 MB  conda-forge
    pybind11-abi-4             |       hd8ed1ab_3          10 KB  conda-forge
    reproc-14.2.4              |       h1a8c8d9_0          27 KB  conda-forge
    reproc-cpp-14.2.4          |       hb7217d7_0          20 KB  conda-forge
    yaml-cpp-0.7.0             |       hb7217d7_2         133 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         9.4 MB

The following NEW packages will be INSTALLED:

  fmt                conda-forge/osx-arm64::fmt-9.1.0-hffc8910_0 
  icu                conda-forge/osx-arm64::icu-70.1-h6b3803e_0 
  krb5               conda-forge/osx-arm64::krb5-1.20.1-h127bd45_0 
  libarchive         conda-forge/osx-arm64::libarchive-3.5.2-h69ec738_3 
  libcurl            conda-forge/osx-arm64::libcurl-7.87.0-hbe9bab4_0 
  libedit            conda-forge/osx-arm64::libedit-3.1.20191231-hc8eb9b7_2 
  libev              conda-forge/osx-arm64::libev-4.33-h642e427_1 
  libiconv           conda-forge/osx-arm64::libiconv-1.17-he4db4b2_0 
  libmamba           conda-forge/osx-arm64::libmamba-1.1.0-h1254013_2 
  libmambapy         conda-forge/osx-arm64::libmambapy-1.1.0-py39h8f82c16_2 
  libnghttp2         conda-forge/osx-arm64::libnghttp2-1.47.0-h232270b_1 
  libsolv            conda-forge/osx-arm64::libsolv-0.7.23-hb5ab8b9_0 
  libssh2            conda-forge/osx-arm64::libssh2-1.10.0-hb80f160_3 
  libxml2            conda-forge/osx-arm64::libxml2-2.9.14-h9d8dfc2_4 
  lz4-c              conda-forge/osx-arm64::lz4-c-1.9.3-hbdafb3b_1 
  lzo                conda-forge/osx-arm64::lzo-2.10-h642e427_1000 
  mamba              conda-forge/osx-arm64::mamba-1.1.0-py39hde45b87_2 
  pybind11-abi       conda-forge/noarch::pybind11-abi-4-hd8ed1ab_3 
  reproc             conda-forge/osx-arm64::reproc-14.2.4-h1a8c8d9_0 
  reproc-cpp         conda-forge/osx-arm64::reproc-cpp-14.2.4-hb7217d7_0 
  yaml-cpp           conda-forge/osx-arm64::yaml-cpp-0.7.0-hb7217d7_2 
  zstd               conda-forge/osx-arm64::zstd-1.5.2-h8128057_4 

The following packages will be UPDATED:

  ca-certificates    pkgs/main::ca-certificates-2022.10.11~ --> conda-forge::ca-certificates-2022.12.7-h4653dfc_0 
  libcxx                pkgs/main::libcxx-12.0.0-hf6beb65_1 --> conda-forge::libcxx-14.0.6-h2692d47_0 
  libzlib                                 1.2.12-ha287fd2_2 --> 1.2.13-h03a7124_4 
  openssl              pkgs/main::openssl-1.1.1s-h1a28f6b_0 --> conda-forge::openssl-1.1.1s-h03a7124_1 
  zlib                    pkgs/main::zlib-1.2.12-h5a0b063_2 --> conda-forge::zlib-1.2.13-h03a7124_4 

The following packages will be SUPERSEDED by a higher-priority channel:

  certifi            pkgs/main/osx-arm64::certifi-2022.12.~ --> conda-forge/noarch::certifi-2022.12.7-pyhd8ed1ab_0 
  conda              pkgs/main::conda-22.11.1-py39hca03da5~ --> conda-forge::conda-22.11.1-py39h2804cbe_1 


Proceed ([y]/n)? 


Downloading and Extracting Packages
                                                                                                                                     
Preparing transaction: done                                                                                                          
Verifying transaction: done                                                                                                          
Executing transaction: done

But then the next mambo command fails :\

mamba env create -f environment.yml

with

pkgs/r/osx-arm64                                              No change
pkgs/main/osx-arm64                                           No change
pkgs/main/noarch                                              No change
pkgs/r/noarch                                                 No change
conda-forge/osx-arm64                                4.7MB @ 351.1kB/s 13.6s
conda-forge/noarch                                  10.7MB @ 566.8kB/s 19.2s

                                                                                                                                     
Looking for: ['_libgcc_mutex==0.1=conda_forge', '_openmp_mutex==4.5=2_gnu', '_pytorch_select==0.1=cpu_0', 'appdirs==1.4.4=pyh9f0ad1d_0', 'audioread==2.1.9=py37h89c1867_4', 'blas==1.0=mkl', 'brotlipy==0.7.0=py37h5e8e339_1001', 'bzip2==1.0.8=h7f98852_4', 'c-ares==1.17.1=h7f98852_1', 'ca-certificates==2020.12.5=ha878542_0', 'cached-property==1.5.2=hd8ed1ab_1', 'cached_property==1.5.2=pyha770c72_1', 'certifi==2020.12.5=py37h89c1867_1', 'cffi==1.14.5=py37hc58025e_0', 'chardet==4.0.0=py37h89c1867_3', 'colorama==0.4.4=pyh9f0ad1d_0', 'cryptography==3.4.6=py37h5d9358c_0', 'cycler==0.10.0=py_2', 'decorator==4.4.2=py_0', 'docopt==0.6.2=py_1', 'ffmpeg==4.3.1=hca11adc_2', 'freetype==2.10.4=h0708190_1', 'gettext==0.19.8.1=h0b5b191_1005', 'gitdb==4.0.5=pyhd8ed1ab_1', 'gitpython==3.1.14=pyhd8ed1ab_0', 'gmp==6.2.1=h58526e2_0', 'gnutls==3.6.13=h85f3911_1', 'h5py==3.1.0=nompi_py37h1e651dc_100', 'hdf5==1.10.6=nompi_h6a2412b_1114', 'idna==2.10=pyh9f0ad1d_0', 'importlib-metadata==3.7.3=py37h89c1867_0', 'importlib_metadata==3.7.3=hd8ed1ab_0', 'intel-openmp==2020.2=254', 'joblib==1.0.1=pyhd8ed1ab_0', 'jpeg==9d=h36c2ea0_0', 'jsonpickle==1.4.1=pyh9f0ad1d_0', 'kiwisolver==1.3.1=py37h2527ec5_1', 'krb5==1.17.2=h926e7f8_0', 'lame==3.100=h7f98852_1001', 'lcms2==2.12=hddcbb42_0', 'ld_impl_linux-64==2.35.1=hea4e1c9_2', 'libblas==3.9.0=1_h86c2bf4_netlib', 'libcblas==3.9.0=5_h92ddd45_netlib', 'libcurl==7.75.0=hc4aaa36_0', 'libedit==3.1.20191231=he28a2e2_2', 'libev==4.33=h516909a_1', 'libffi==3.3=h58526e2_2', 'libflac==1.3.3=h9c3ff4c_1', 'libgcc-ng==9.3.0=h2828fa1_19', 'libgfortran-ng==9.3.0=hff62375_19', 'libgfortran5==9.3.0=hff62375_19', 'libgomp==9.3.0=h2828fa1_19', 'liblapack==3.9.0=5_h92ddd45_netlib', 'libllvm10==10.0.1=he513fc3_3', 'libnghttp2==1.43.0=h812cca2_0', 'libogg==1.3.4=h7f98852_1', 'libopenblas==0.3.12=pthreads_h4812303_1', 'libopus==1.3.1=h7f98852_1', 'libpng==1.6.37=h21135ba_2', 'librosa==0.8.0=pyh9f0ad1d_0', 'libsndfile==1.0.31=h9c3ff4c_1', 'libssh2==1.9.0=ha56f1ee_6', 'libstdcxx-ng==9.3.0=h6de172a_19', 'libtiff==4.2.0=hbd63e13_2', 'libvorbis==1.3.7=h9c3ff4c_0', 'libwebp-base==1.2.0=h7f98852_2', 'libzlib==1.2.11=h36c2ea0_1013', 'llvm-openmp==11.1.0=h4bd325d_1', 'llvmlite==0.36.0=py37h9d7f4d0_0', 'lz4-c==1.9.3=h9c3ff4c_1', 'matplotlib-base==3.3.4=py37h0c9df89_0', 'mkl==2020.2=256', 'mkl-service==2.3.0=py37h8f50634_2', 'munch==2.5.0=py_0', 'ncurses==6.2=h58526e2_4', 'nettle==3.6=he412f7d_0', 'ninja==1.10.2=h4bd325d_0', 'numba==0.53.0=py37h7dd73a4_1', 'numpy==1.20.1=py37haa41c4c_0', 'olefile==0.46=pyh9f0ad1d_1', 'openblas==0.3.12=pthreads_h04b7a96_1', 'openh264==2.1.1=h780b84a_0', 'openjpeg==2.4.0=hb52868f_1', 'openssl==1.1.1k=h7f98852_0', 'packaging==20.9=pyh44b312d_0', 'pandas==1.2.3=py37hdc94413_0', 'pillow==8.1.2=py37h4600e1f_1', 'pip==21.0.1=pyhd8ed1ab_0', 'pooch==1.3.0=pyhd8ed1ab_0', 'py-cpuinfo==7.0.0=pyh9f0ad1d_0', 'pycparser==2.20=pyh9f0ad1d_2', 'pyopenssl==20.0.1=pyhd8ed1ab_0', 'pyparsing==2.4.7=pyhd8ed1ab_1', 'pysocks==1.7.1=py37h89c1867_5', 'pysoundfile==0.10.3.post1=pyhd3deb0d_0', 'python==3.7.10=hffdb5ce_100_cpython', 'python-dateutil==2.8.1=py_0', 'python_abi==3.7=3_cp37m', 'pytz==2021.1=pyhd8ed1ab_0', 'readline==8.0=he28a2e2_2', 'requests==2.25.1=pyhd3deb0d_0', 'resampy==0.2.2=py_0', 'scikit-learn==0.24.1=py37h69acf81_0', 'scipy==1.6.1=py37h14a347d_0', 'setuptools==49.6.0=py37h89c1867_3', 'six==1.15.0=pyh9f0ad1d_0', 'smmap==3.0.5=pyh44b312d_0', 'sqlite==3.34.0=h74cdb3f_0', 'threadpoolctl==2.1.0=pyh5ca1d4c_0', 'tk==8.6.10=h21135ba_1', 'tornado==6.1=py37h5e8e339_1', 'typing_extensions==3.7.4.3=py_0', 'urllib3==1.26.4=pyhd8ed1ab_0', 'wrapt==1.12.1=py37h5e8e339_3', 'x264==1!161.3030=h7f98852_1', 'xz==5.2.5=h516909a_1', 'zipp==3.4.1=pyhd8ed1ab_0', 'zlib==1.2.11=h36c2ea0_1013', 'zstd==1.4.9=ha95c52a_0']


Could not solve for environment specs
Encountered problems while solving:
  - nothing provides requested _libgcc_mutex ==0.1 conda_forge
  - nothing provides requested _openmp_mutex ==4.5 2_gnu
  - nothing provides requested audioread ==2.1.9 py37h89c1867_4
  - nothing provides requested blas ==1.0 mkl
  - nothing provides requested brotlipy ==0.7.0 py37h5e8e339_1001
  - nothing provides requested bzip2 ==1.0.8 h7f98852_4
  - nothing provides requested c-ares ==1.17.1 h7f98852_1
  - nothing provides requested ca-certificates ==2020.12.5 ha878542_0
  - nothing provides requested certifi ==2020.12.5 py37h89c1867_1
  - nothing provides requested cffi ==1.14.5 py37hc58025e_0
  - nothing provides requested chardet ==4.0.0 py37h89c1867_3
  - nothing provides requested cryptography ==3.4.6 py37h5d9358c_0
  - nothing provides requested ffmpeg ==4.3.1 hca11adc_2
  - nothing provides requested freetype ==2.10.4 h0708190_1
  - nothing provides requested gettext ==0.19.8.1 h0b5b191_1005
  - nothing provides requested gmp ==6.2.1 h58526e2_0
  - nothing provides requested gnutls ==3.6.13 h85f3911_1
  - nothing provides requested h5py ==3.1.0 nompi_py37h1e651dc_100
  - nothing provides requested hdf5 ==1.10.6 nompi_h6a2412b_1114
  - nothing provides requested importlib-metadata ==3.7.3 py37h89c1867_0
  - nothing provides requested intel-openmp ==2020.2 254
  - nothing provides requested jpeg ==9d h36c2ea0_0
  - nothing provides requested kiwisolver ==1.3.1 py37h2527ec5_1
  - nothing provides requested krb5 ==1.17.2 h926e7f8_0
  - nothing provides requested lame ==3.100 h7f98852_1001
  - nothing provides requested lcms2 ==2.12 hddcbb42_0
  - nothing provides requested ld_impl_linux-64 ==2.35.1 hea4e1c9_2
  - nothing provides requested libblas ==3.9.0 1_h86c2bf4_netlib
  - nothing provides requested libcblas ==3.9.0 5_h92ddd45_netlib
  - nothing provides requested libcurl ==7.75.0 hc4aaa36_0
  - nothing provides requested libedit ==3.1.20191231 he28a2e2_2
  - nothing provides requested libev ==4.33 h516909a_1
  - nothing provides requested libffi ==3.3 h58526e2_2
  - nothing provides requested libflac ==1.3.3 h9c3ff4c_1
  - nothing provides requested libgcc-ng ==9.3.0 h2828fa1_19
  - nothing provides requested libgfortran-ng ==9.3.0 hff62375_19
  - nothing provides requested libgfortran5 ==9.3.0 hff62375_19
  - nothing provides requested libgomp ==9.3.0 h2828fa1_19
  - nothing provides requested liblapack ==3.9.0 5_h92ddd45_netlib
  - nothing provides requested libllvm10 ==10.0.1 he513fc3_3
  - nothing provides requested libnghttp2 ==1.43.0 h812cca2_0
  - nothing provides requested libogg ==1.3.4 h7f98852_1
  - nothing provides requested libopenblas ==0.3.12 pthreads_h4812303_1
  - nothing provides requested libopus ==1.3.1 h7f98852_1
  - nothing provides requested libpng ==1.6.37 h21135ba_2
  - nothing provides requested libsndfile ==1.0.31 h9c3ff4c_1
  - nothing provides requested libssh2 ==1.9.0 ha56f1ee_6
  - nothing provides requested libstdcxx-ng ==9.3.0 h6de172a_19
  - nothing provides requested libtiff ==4.2.0 hbd63e13_2
  - nothing provides requested libvorbis ==1.3.7 h9c3ff4c_0
  - nothing provides requested libwebp-base ==1.2.0 h7f98852_2
  - nothing provides requested libzlib ==1.2.11 h36c2ea0_1013
  - nothing provides requested llvm-openmp ==11.1.0 h4bd325d_1
  - nothing provides requested llvmlite ==0.36.0 py37h9d7f4d0_0
  - nothing provides requested lz4-c ==1.9.3 h9c3ff4c_1
  - nothing provides requested matplotlib-base ==3.3.4 py37h0c9df89_0
  - nothing provides requested mkl ==2020.2 256
  - nothing provides requested mkl-service ==2.3.0 py37h8f50634_2
  - nothing provides requested ncurses ==6.2 h58526e2_4
  - nothing provides requested nettle ==3.6 he412f7d_0
  - nothing provides requested ninja ==1.10.2 h4bd325d_0
  - nothing provides requested numba ==0.53.0 py37h7dd73a4_1
  - nothing provides requested numpy ==1.20.1 py37haa41c4c_0
  - nothing provides requested openblas ==0.3.12 pthreads_h04b7a96_1
  - nothing provides requested openh264 ==2.1.1 h780b84a_0
  - nothing provides requested openjpeg ==2.4.0 hb52868f_1
  - nothing provides requested openssl ==1.1.1k h7f98852_0
  - nothing provides requested pandas ==1.2.3 py37hdc94413_0
  - nothing provides requested pillow ==8.1.2 py37h4600e1f_1
  - nothing provides requested pysocks ==1.7.1 py37h89c1867_5
  - nothing provides requested python ==3.7.10 hffdb5ce_100_cpython
  - nothing provides requested readline ==8.0 he28a2e2_2
  - nothing provides requested scikit-learn ==0.24.1 py37h69acf81_0
  - nothing provides requested scipy ==1.6.1 py37h14a347d_0
  - nothing provides requested setuptools ==49.6.0 py37h89c1867_3
  - nothing provides requested sqlite ==3.34.0 h74cdb3f_0
  - nothing provides requested tk ==8.6.10 h21135ba_1
  - nothing provides requested tornado ==6.1 py37h5e8e339_1
  - nothing provides requested wrapt ==1.12.1 py37h5e8e339_3
  - nothing provides requested x264 ==1!161.3030 h7f98852_1
  - nothing provides requested xz ==5.2.5 h516909a_1
  - nothing provides requested zlib ==1.2.11 h36c2ea0_1013
  - nothing provides requested zstd ==1.4.9 ha95c52a_0
  - package pytz-2021.1-pyhd8ed1ab_0 requires python >=3, but none of the providers can be installed

The environment can't be solved, aborting the operation

This is on an OSX Apple Silicon machine

opened by turian 1

ImportError: cannot import name 'F1' from 'torchmetrics' (/app/anaconda3/lib/python3.7/site-packages/torchmetrics/__init__.py)

python ex_openmic.py Traceback (most recent call last): File "ex_openmic.py", line 5, in from pytorch_lightning.callbacks import ModelCheckpoint File "/root/work_project_2021/project_music2video/PaSST/src/pytorch-lightning/pytorch_lightning/init.py", line 65, in from pytorch_lightning import metrics File "/root/work_project_2021/project_music2video/PaSST/src/pytorch-lightning/pytorch_lightning/metrics/init.py", line 16, in from pytorch_lightning.metrics.classification import ( # noqa: F401 File "/root/work_project_2021/project_music2video/PaSST/src/pytorch-lightning/pytorch_lightning/metrics/classification/init.py", line 19, in from pytorch_lightning.metrics.classification.f_beta import F1, FBeta # noqa: F401 File "/root/work_project_2021/project_music2video/PaSST/src/pytorch-lightning/pytorch_lightning/metrics/classification/f_beta.py", line 16, in from torchmetrics import F1 as _F1 ImportError: cannot import name 'F1' from 'torchmetrics' (/app/anaconda3/lib/python3.7/site-packages/torchmetrics/init.py)

envs: Name: torch Version: 1.12.1 Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration Home-page: https://pytorch.org/ Author: PyTorch Team Author-email: [email protected] License: BSD-3 Location: /app/anaconda3/lib/python3.7/site-packages Requires: typing-extensions Required-by: torchvision, torchmetrics, torchaudio, timm, test-tube, Ba3l, pytorch-lightning

opened by aiXia121 1
The loop in the diagram

This is an amazing job! But I have a question: what does the loop in the diagram mean? In fact, I didn't find the loop operation in the paper and codes. Thanks!

opened by YangYangTaoTao 1
Installation issues
Hi, I am trying to install and run the PaSST-S method on my own data but I get this error when I run python ex_audioset.py help

File "ex_audioset.py", line 16, in <module> from helpers.mixup import my_mixup ModuleNotFoundError: No module named 'helpers.mixup'
opened by p4vlos 1
mismatch version of pytorch-lighting and sarced

Hello,when I after running the code following, and I run the code
but I encontered the issue: Is this the wrong version of pytorch-lighting and sarced?When I upgrad the pytorch-lighting to the latest version,the issue is solved but the issue with sacred has not been solved. Could you please provide some help?Thank you very much!

opened by Junglesl 15

Releases(v.0.0.7-audioset)

v.0.0.7-audioset(Oct 20, 2022)

Pre-trained PaSST-U and PaSST-B on Audioset
Source code(tar.gz)
Source code(zip)
passt-b-f128-p16-s16-ap.459.pt(328.69 MB)
passt-u600-f128-p16-s16-ap.460.pt(328.69 MB)
v0.0.5(Mar 29, 2022)
fsd50k-passt-s-n-f128-p16-s16-ap.642.pt pre-trained on FSD50K with structured patchout and no overlap map=0.642.

fsd50k-passt-s-f128-p16-s10-ap.655.pt pre-trained on FSD50K with structured patchout map=0.655.

openmic-passt-s-f128-10sec-p16-s10-ap.85.pt pre-trained on OpenMIC-2008 with structured patchout map=0.85.

passt-s-f128-30sec-p16-s10-ap.473-swa.pt pre-trained on Audioset but supports inference up to 30-seconds.

passt-s-f128-20sec-p16-s10-ap.474-swa.pt pre-trained on Audioset but supports inference up to 20-seconds.

Source code(tar.gz)
Source code(zip)
fsd50k-passt-s-f128-p16-s10-ap.655.pt(326.78 MB)
fsd50k-passt-s-n-f128-p16-s16-ap.642.pt(326.66 MB)
openmic-passt-s-f128-10sec-p16-s10-ap.85.pt(325.72 MB)
passt-s-f128-20sec-p16-s10-ap.474-swa.pt(328.99 MB)
passt-s-f128-30sec-p16-s10-ap.473-swa.pt(329.28 MB)
v.0.0.6(Jun 9, 2022)
Pre-trained models on the 5 folds of ESC-50.

The pre-processed ESC50 dataset for fine tuning.

Source code(tar.gz)
Source code(zip)
esc50-passt-s-n-f128-p16-s10-fold1-acc.967.pt(325.90 MB)
esc50-passt-s-n-f128-p16-s10-fold2-acc.977.pt(325.90 MB)
esc50-passt-s-n-f128-p16-s10-fold3-acc.959.pt(325.90 MB)
esc50-passt-s-n-f128-p16-s10-fold4-acc.987.pt(325.90 MB)
esc50-passt-s-n-f128-p16-s10-fold5-acc.962.pt(325.90 MB)
esc50.zip(458.36 MB)
v0.0.3-audioset(Mar 9, 2022)

Pre-trained models with a smaller STFT hop
Source code(tar.gz)
Source code(zip)
passt-s-f128-stfthop100-p16-s10-ap.473-swa.pt(329.34 MB)
passt-s-f128-stfthop160-p16-s10-ap.473-swa.pt(328.99 MB)
v0.0.2-audioset(Oct 28, 2021)

Added more pretrained models
Source code(tar.gz)
Source code(zip)
passt-s-f128-p16-s10-ap.472.pt(328.69 MB)
passt-s-f128-p16-s10-ap.4761-swa.pt(328.69 MB)
passt-s-f128-p16-s12-ap.470.pt(328.64 MB)
passt-s-f128-p16-s12-ap.473-swa.pt(328.64 MB)
passt-s-f128-p16-s14-ap.469.pt(328.60 MB)
passt-s-f128-p16-s14-ap.471-swa.pt(328.60 MB)
passt-s-f128-p16-s16-ap.468.pt(328.57 MB)
passt-s-f128-p16-s16-ap.473-swa.pt(328.57 MB)
v0.0.1-audioset(Oct 18, 2021)

Source code(tar.gz)
Source code(zip)
passt-s-f128-p16-s10-ap.476-swa.pt(328.69 MB)

Owner

GitHub Repository

Robocop is your personal mini voice assistant made using Python.

Robocop-VoiceAssistant To use this project, you should have python installed in your system. If you don't have python installed, install it beforehand

3 Feb 26, 2022

Airborne Optical Sectioning (AOS) is a wide synthetic-aperture imaging technique

AOS: Airborne Optical Sectioning Airborne Optical Sectioning (AOS) is a wide synthetic-aperture imaging technique that employs manned or unmanned airc

39 Dec 09, 2022

Spectralformer: Rethinking hyperspectral image classification with transformers

The code in this toolbox implements the "Spectralformer: Rethinking hyperspectral image classification with transformers". More specifically, it is detailed as follow.

104 Jan 04, 2023

Neural network-based build time estimation for additive manufacturing

Neural network-based build time estimation for additive manufacturing Oh, Y., Sharp, M., Sprock, T., & Kwon, S. (2021). Neural network-based build tim

1 Nov 15, 2021

Tidy interface to polars

tidypolars tidypolars is a data frame library built on top of the blazingly fast polars library that gives access to methods and functions familiar to

144 Jan 08, 2023

Computational modelling of ray propagation through optical elements using the principles of geometric optics (Ray Tracer)

Computational modelling of ray propagation through optical elements using the principles of geometric optics (Ray Tracer) Introduction By applying the

1 Jul 09, 2022

CUAD

Contract Understanding Atticus Dataset This repository contains code for the Contract Understanding Atticus Dataset (CUAD), a dataset for legal contra

273 Dec 17, 2022

Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.

Multi-Task Framework for Cross-Lingual Abstractive Summarization (MCLAS) The code for ACL2021 paper Cross-Lingual Abstractive Summarization with Limit

43 Nov 07, 2022

PyTorch reimplementation of hand-biomechanical-constraints (ECCV2020)

Hand Biomechanical Constraints Pytorch Unofficial PyTorch reimplementation of Hand-Biomechanical-Constraints (ECCV2020). This project reimplement foll

59 Dec 20, 2022

Download and preprocess popular sequential recommendation datasets

Sequential Recommendation Datasets This repository collects some commonly used sequential recommendation datasets in recent research papers and provid

125 Dec 06, 2022

JFB: Jacobian-Free Backpropagation for Implicit Models

28 Dec 11, 2022

Bling's Object detection tool

BriVL for Building Applications This repo is used for illustrating how to build applications by using BriVL model. This repo is re-implemented from fo

47 Nov 01, 2022

Vehicle direction identification consists of three module detection , tracking and direction recognization.

Vehicle-direction-identification Vehicle direction identification consists of three module detection , tracking and direction recognization. Algorithm

5 Nov 15, 2022

GNNAdvisor: An Efficient Runtime System for GNN Acceleration on GPUs

GNNAdvisor: An Efficient Runtime System for GNN Acceleration on GPUs [Paper, Slides, Video Talk] at USENIX OSDI'21 @inproceedings{GNNAdvisor, title=

47 Jan 03, 2023

SketchEdit: Mask-Free Local Image Manipulation with Partial Sketches

SketchEdit: Mask-Free Local Image Manipulation with Partial Sketches [Paper] [Project Page] [Interactive Demo] [Supplementary Material] Usag

215 Dec 25, 2022

PyAF is an Open Source Python library for Automatic Time Series Forecasting built on top of popular pydata modules.

PyAF (Python Automatic Forecasting) PyAF is an Open Source Python library for Automatic Forecasting built on top of popular data science python module

405 Jan 02, 2023

Improving Query Representations for DenseRetrieval with Pseudo Relevance Feedback:A Reproducibility Study.

APR The repo for the paper Improving Query Representations for DenseRetrieval with Pseudo Relevance Feedback:A Reproducibility Study. Environment setu

8 Nov 26, 2022

Deep Q-network learning to play flappybird.

AI Plays Flappy Bird I've trained a DQN that learns to play flappy bird on it's own. Try the pre-trained model First install the pip requirements and

3 Mar 01, 2022

Bio-Computing Platform Featuring Large-Scale Representation Learning and Multi-Task Deep Learning “螺旋桨”生物计算工具集

English | 简体中文 Latest News 2021.10.25 Paper "Docking-based Virtual Screening with Multi-Task Learning" is accepted by BIBM 2021. 2021.07.29 PaddleHeli

633 Jan 04, 2023

Auditing Black-Box Prediction Models for Data Minimization Compliance

Data-Minimization-Auditor An auditing tool for model-instability based data minimization that is introduced in "Auditing Black-Box Prediction Models f

2 Mar 24, 2022

PaSST: Efficient Training of Audio Transformers with Patchout

Related tags

Overview

PaSST: Efficient Training of Audio Transformers with Patchout

Setting up the experiments environment

Training on Audioset

Pre-trained models

Contact

Comments

Releases(v.0.0.7-audioset)

v.0.0.7-audioset(Oct 20, 2022)

v0.0.5(Mar 29, 2022)

v.0.0.6(Jun 9, 2022)

v0.0.3-audioset(Mar 9, 2022)

v0.0.2-audioset(Oct 28, 2021)

v0.0.1-audioset(Oct 18, 2021)

Owner

Robocop is your personal mini voice assistant made using Python.

Airborne Optical Sectioning (AOS) is a wide synthetic-aperture imaging technique

Spectralformer: Rethinking hyperspectral image classification with transformers

Neural network-based build time estimation for additive manufacturing

Tidy interface to polars

Computational modelling of ray propagation through optical elements using the principles of geometric optics (Ray Tracer)

CUAD

Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.

PyTorch reimplementation of hand-biomechanical-constraints (ECCV2020)

Download and preprocess popular sequential recommendation datasets

JFB: Jacobian-Free Backpropagation for Implicit Models

Bling's Object detection tool

Vehicle direction identification consists of three module detection , tracking and direction recognization.

GNNAdvisor: An Efficient Runtime System for GNN Acceleration on GPUs

SketchEdit: Mask-Free Local Image Manipulation with Partial Sketches

PyAF is an Open Source Python library for Automatic Time Series Forecasting built on top of popular pydata modules.

Improving Query Representations for DenseRetrieval with Pseudo Relevance Feedback:A Reproducibility Study.

Deep Q-network learning to play flappybird.

Bio-Computing Platform Featuring Large-Scale Representation Learning and Multi-Task Deep Learning “螺旋桨”生物计算工具集

Auditing Black-Box Prediction Models for Data Minimization Compliance