A two-stage U-Net for high-fidelity denoising of historical recordings

Last update: Jan 05, 2023

Overview

A two-stage U-Net for high-fidelity denoising of historical recordings

Official repository of the paper (not submitted yet):

E. Moliner and V. Välimäki,, "A two-stage U-Net for high-fidelity denosing of historical recordinds", in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Singapore, May, 2022

Abstract

Enhancing the sound quality of historical music recordings is a long-standing problem. This paper presents a novel denoising method based on a fully-convolutional deep neural network. A two-stage U-Net model architecture is designed to model and suppress the degradations with high fidelity. The method processes the time-frequency representation of audio, and is trained using realistic noisy data to jointly remove hiss, clicks, thumps, and other common additive disturbances from old analog discs. The proposed model outperforms previous methods in both objective and subjective metrics. The results of a formal blind listening test show that the method can denoise real gramophone recordings with an excellent quality. This study shows the importance of realistic training data and the power of deep learning in audio restoration.

Listen to our audio samples

Requirements

You will need at least python 3.7 and CUDA 10.1 if you want to use GPU. See requirements.txt for the required package versions.

To install the environment through anaconda, follow the instructions:

conda env update -f environment.yml
conda activate historical_denoiser

Denoising Recordings

Run the following commands to clone the repository and install the pretrained weights of the two-stage U-Net model:

git clone https://github.com/eloimoliner/denoising-historical-recordings.git
cd denoising-historical-recordings
wget https://github.com/eloimoliner/denoising-historical-recordings/releases/download/v0.0/checkpoint.zip
unzip checkpoint.zip /experiments/trained_model/

If the environment is installed correctly, you can denoise an audio file by running:

bash inference.sh "file name"

A ".wav" file with the denoised version, as well as the residual noise and the original signal in "mono", will be generated in the same directory as the input file.

Training

TODO

Comments

Will it work in Windows without CUDA?

Hello, The readme says: "You will need at least python 3.7 and CUDA 10.1 if you want to use GPU."

Unfortunately, my first attempt to run it in Windows without CUDA-supporting VGA failed. There is really no separate environment file for CPU-only? Is it possible to make it work without massive changes to the code?

opened by vitacon 15
installation without conda

Hi,

could you leave some hints about how to install this without conda? Your readme appears to be very much specified to this one case. Also it seems that you develop under linux so you use bash to execute. Maybe here a hint for win- users would be cool too.

I am just trying to get this to run under windows and so far had no success. I will update if I get further. All the best!

opened by GitHubGeniusOverlord 9
strange tensorflow version in requirements.txt

Hi,

when running python -m pip install tensorflow==2.3.0 as indicated in your requirements file, I get

ERROR: Could not find a version that satisfies the requirement tensorflow==2.3.0 (from versions: 2.5.0rc0, 2.5.0rc1, 2.5.0rc2, 2.5.0rc3, 2.5.0, 2.5.1, 2.5.2, 2.6.0rc0, 2.6.0rc1, 2.6.0rc2, 2.6.0, 2.6.1, 2.6.2, 2.7.0rc0, 2.7.0rc1, 2.7.0, 2.8.0rc0) ERROR: No matching distribution found for tensorflow==2.3.0

It seems this version isn't even supported by pip anymore. Upgrade to 2.5.0?

The same is true for scipy==1.4.1. Not sure about which version to take there.

opened by GitHubGeniusOverlord 3
Update inference.sh

Small change to allow spaces in file names. Bash expands the variable $1 correctly even if it is in double quotes, python receives a single argument and not (if there are spaces) multiple arguments.

opened by JorenSix 1
How to start training for denoising?

If I would like to do a denoising task, where I've clean signals (in the "clean" folder) and noisy signals (in the "noise" folder).

opened by listener17 1

Releases(v0.0)

v0.0(Aug 31, 2021)

Uploading pretrained model
Source code(tar.gz)
Source code(zip)
checkpoint.zip(251.80 MB)

Owner

Eloi Moliner Juanpere

Doctoral candidate on audio signal processing at Aalto university.

GitHub Repository

AFL binary instrumentation

E9AFL --- Binary AFL E9AFL inserts American Fuzzy Lop (AFL) instrumentation into x86_64 Linux binaries. This allows binaries to be fuzzed without the

242 Dec 12, 2022

Repo for "Physion: Evaluating Physical Prediction from Vision in Humans and Machines" submission to NeurIPS 2021 (Datasets & Benchmarks track)

Physion: Evaluating Physical Prediction from Vision in Humans and Machines This repo contains code and data to reproduce the results in our paper, Phy

38 Jan 06, 2023

YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with ONNX, TensorRT, ncnn, and OpenVINO supported.

Introduction YOLOX is an anchor-free version of YOLO, with a simpler design but better performance! It aims to bridge the gap between research and ind

7.7k Jan 03, 2023

A simple python module to generate anchor (aka default/prior) boxes for object detection tasks.

PyBx WIP A simple python module to generate anchor (aka default/prior) boxes for object detection tasks. Calculated anchor boxes are returned as ndarr

4 Dec 15, 2022

Attentive Implicit Representation Networks (AIR-Nets)

Attentive Implicit Representation Networks (AIR-Nets) Preprint | Supplementary | Accepted at the International Conference on 3D Vision (3DV) teaser.mo

29 Dec 07, 2022

DeepCO3: Deep Instance Co-segmentation by Co-peak Search and Co-saliency

[CVPR19] DeepCO3: Deep Instance Co-segmentation by Co-peak Search and Co-saliency (Oral paper) Authors: Kuang-Jui Hsu, Yen-Yu Lin, Yung-Yu Chuang PDF:

139 Dec 22, 2022

Proposed n-stage Latent Dirichlet Allocation method - A Novel Approach for LDA

n-stage Latent Dirichlet Allocation (n-LDA) Proposed n-LDA & A Novel Approach for classical LDA Latent Dirichlet Allocation (LDA) is a generative prob

4 Mar 07, 2022

Implementation of SETR model, Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.

SETR - Pytorch Since the original paper (Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.) has no official

112 Dec 16, 2022

DeepLM: Large-scale Nonlinear Least Squares on Deep Learning Frameworks using Stochastic Domain Decomposition (CVPR 2021)

DeepLM DeepLM: Large-scale Nonlinear Least Squares on Deep Learning Frameworks using Stochastic Domain Decomposition (CVPR 2021) Run Please install th

130 Dec 02, 2022

This repository contains part of the code used to make the images visible in the article "How does an AI Imagine the Universe?" published on Towards Data Science.

Generative Adversarial Network - Generating Universe This repository contains part of the code used to make the images visible in the article "How doe

9 Dec 18, 2022

A two-stage U-Net for high-fidelity denoising of historical recordings

Related tags

Overview

A two-stage U-Net for high-fidelity denoising of historical recordings

Abstract

Requirements

Denoising Recordings

Training

Comments

Will it work in Windows without CUDA?

installation without conda

strange tensorflow version in requirements.txt

Update inference.sh

How to start training for denoising?

Releases(v0.0)

v0.0(Aug 31, 2021)

Owner

Eloi Moliner Juanpere

AFL binary instrumentation

Repo for "Physion: Evaluating Physical Prediction from Vision in Humans and Machines" submission to NeurIPS 2021 (Datasets & Benchmarks track)

YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with ONNX, TensorRT, ncnn, and OpenVINO supported.

A simple python module to generate anchor (aka default/prior) boxes for object detection tasks.

Attentive Implicit Representation Networks (AIR-Nets)

DeepCO3: Deep Instance Co-segmentation by Co-peak Search and Co-saliency

Proposed n-stage Latent Dirichlet Allocation method - A Novel Approach for LDA

Implementation of SETR model, Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.

DeepLM: Large-scale Nonlinear Least Squares on Deep Learning Frameworks using Stochastic Domain Decomposition (CVPR 2021)

This repository contains part of the code used to make the images visible in the article "How does an AI Imagine the Universe?" published on Towards Data Science.

NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR2021)

GradAttack is a Python library for easy evaluation of privacy risks in public gradients in Federated Learning

A privacy-focused, intelligent security camera system.

Volsdf - Volume Rendering of Neural Implicit Surfaces

Urban mobility simulations with Python3, RLlib (Deep Reinforcement Learning) and Mesa (Agent-based modeling)

Code for paper Novel View Synthesis via Depth-guided Skip Connections

A LiDAR point cloud cluster for panoptic segmentation

Custom studies about block sparse attention.

CLOOB training (JAX) and inference (JAX and PyTorch)

Towards Flexible Blind JPEG Artifacts Removal (FBCNN, ICCV 2021)