Unimodal Face Classification with Multimodal Training

This is a PyTorch implementation of the following paper:

Unimodal Face Classification with Multimodal Training

Wenbin Teng (Boston University), Chongyang Bai (Dartmouth College)

Abstract: We propose a Multimodal Training Unimodal Test (MTUT) framework for robust face classification, which exploits the cross-modality relationship during training and applies it as a complementary of the imperfect single modality input during testing. Technically, during training, the framework (1) builds both intra-modality and cross-modality autoencoders with the aid of facial attributes to learn latent embeddings as multimodal descriptors, (2) proposes a novel multimodal embedding divergence loss to align the heterogeneous features from different modalities, which also adaptively avoids the useless modality (if any) from confusing the model. This way, the learned autoencoders can generate robust embeddings in single-modality face classification on test stage. We evaluate our framework in two face classification datasets and two kinds of testing input: (1) poor-condition image and (2) point cloud or 3D face mesh, when both 2D and 3D modalities are available for training.

The proposed method applies both 2D and 3D encoder to extract the embeddings of each individual modalities. Divergence between both embeddings is minimized adaptively through measuring the classification loss. Based on the type of testing modality, we use certain decoder to reconstruct 2D and 3D inputs from feature embeddings. An overview of the proposed network is shown in the following picture:

Unimodal Face Classification with Multimodal Training

Related tags

Overview

Unimodal Face Classification with Multimodal Training

Owner

Wenbin Teng

exponential adaptive pooling for PyTorch

Training vision models with full-batch gradient descent and regularization

In this project, we create and implement a deep learning library from scratch.

A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution.

PyTorch implementation of the paper: Label Noise Transition Matrix Estimation for Tasks with Lower-Quality Features

PyTorch implementation of 'Gen-LaneNet: a generalized and scalable approach for 3D lane detection'

True Few-Shot Learning with Language Models

Official implementation of particle-based models (GNS and DPI-Net) on the Physion dataset.

Simply enable or disable your Nvidia dGPU

A large-scale benchmark for co-optimizing the design and control of soft robots, as seen in NeurIPS 2021.

Simulation of self-focusing of laser beams in condensed media

ChebLieNet, a spectral graph neural network turned equivariant by Riemannian geometry on Lie groups.

Code for paper [ACE: Ally Complementary Experts for Solving Long-Tailed Recognition in One-Shot] (ICCV 2021, oral))

The code used for the free [email protected] Webinar series on Reinforcement Learning in Finance

Code for Recurrent Mask Refinement for Few-Shot Medical Image Segmentation (ICCV 2021).

KE-Dialogue: Injecting knowledge graph into a fully end-to-end dialogue system.

Code for "SRHEN: Stepwise-Refining Homography Estimation Network via Parsing Geometric Correspondences in Deep Latent Space"

Unofficial PyTorch implementation of the Adaptive Convolution architecture for image style transfer

Saliency - Framework-agnostic implementation for state-of-the-art saliency methods (XRAI, BlurIG, SmoothGrad, and more).

This repository contains code to run experiments in the paper "Signal Strength and Noise Drive Feature Preference in CNN Image Classifiers."