Visualizer using audio and semantic analysis to explore BigGAN (Brock et al., 2018) latent space.

Last update: Nov 21, 2022

Overview

BigGAN Audio Visualizer

Description

This visualizer explores BigGAN (Brock et al., 2018) latent space by using pitch/tempo of an audio file to generate and interpolate between noise/class vector inputs to the model. Classes are chosen manually or optionally using semantic similarity on BERT encodings of a lyrics corpus.

Usage:

usage: visualize.py [-h] -s SONG [--resolution {128,256,512}] [-d DURATION]
               [-ps [200-295]] [-ts [0.05-0.8]]
               [--classes CLASSES [CLASSES ...]] [-n NUM_CLASSES]
               [--jitter [0-1]] [--frame_length i*2^6] [--truncation [0.1-1]]
               [--smooth_factor [10-30]] [--batch_size BATCH_SIZE]
               [-o OUTPUT_FILE] [--use_last_vectors] [--use_last_classes]
               [-l LYRICS]

Arguments

short	long	default	range	help
`-h`	`--help`			show this help message and exit
`-s`	`--song`	`input/romantic.mp3`		path to input audio file
	`--resolution`	`512`	`{128,256,512}`	output video resolution
`-d`	`--duration`	`None`		output video duration
`-ps`	`--pitch_sensitivity`	`220`	`[200-295]`	controls the sensitivity of the class vector to changes in pitch
`-ts`	`--tempo_sensitivity`	`0.25`	`[0.05-0.8]`	controls the sensitivity of the noise vector to changes in volume and tempo
	`--classes`	`None`		manually specify [--num_classes] ImageNet classes
`-n`	`--num_classes`	`12`		number of unique classes to use
	`--jitter`	`0.5`	`[0-1]`	controls jitter of the noise vector to reduce repitition
	`--frame_length`	`512`	`i*2^6`	number of audio frames to video frames in the output
	`--truncation`	`1`	`[0.1-1]`	BigGAN truncation parameter controls complexity of structure within frames
	`--smooth_factor`	`20`	`[10-30]`	controls interpolation between class vectors to smooth rapid flucations
	`--batch_size`	`30`		BigGAN batch_size
`-o`	`--output_file`			name of output file stored in output/, defaults to [--song] path base_name
	`--use_last_vectors`	`False`		set flag to use previous saved class/noise vectors
	`--use_last_classes`	`False`		set flag to use previous classes
`-l`	`--lyrics`	`None`		path to lyrics file; setting [--lyrics LYRICS] computes classes by semantic similarity under BERT encodings

Visualizer using audio and semantic analysis to explore BigGAN (Brock et al., 2018) latent space.

Related tags

Overview

BigGAN Audio Visualizer

Description

Usage:

Arguments

Owner

Rush Kapoor

Architecture Patterns with Python (TDD, DDD, EDM)

Topic Modelling for Humans

Official codebase for Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World

Audio-Visual Generalized Few-Shot Learning with Prototype-Based Co-Adaptation

Object detection using yolo-tiny model and opencv used as backend

ISBI 2022: Cross-level Contrastive Learning and Consistency Constraint for Semi-supervised Medical Image.

Code for unmixing audio signals in four different stems "drums, bass, vocals, others". The code is adapted from "Jukebox: A Generative Model for Music"

Modified fork of Xuebin Qin's U-2-Net Repository. Used for demonstration purposes.

Removing Inter-Experimental Variability from Functional Data in Systems Neuroscience

[BMVC'21] Official PyTorch Implementation of Grounded Situation Recognition with Transformers

Geometry-Aware Learning of Maps for Camera Localization (CVPR2018)

Simulating Sycamore quantum circuits classically using tensor network algorithm.

Implement face detection, and age and gender classification, and emotion classification.

Official PyTorch implementation of the paper "TEMOS: Generating diverse human motions from textual descriptions"

Free Book about Deep-Learning approaches for Chess (like AlphaZero, Leela Chess Zero and Stockfish NNUE)

Automatic Attendance marker for LMS Practice School Division, BITS Pilani

Human Pose estimation with TensorFlow framework

Train a deep learning net with OpenStreetMap features and satellite imagery.

HyperLib: Deep learning in the Hyperbolic space

Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts