Face Transformer for Recognition

Last update: Nov 30, 2022

Related tags

Overview

Face-Transformer

This is the code of Face Transformer for Recognition (https://arxiv.org/abs/2103.14803v2).

Recently there has been great interests of Transformer not only in NLP but also in computer vision. We wonder if transformer can be used in face recognition and whether it is better than CNNs. Therefore, we investigate the performance of Transformer models in face recognition. The models are trained on a large scale face recognition database MS-Celeb-1M and evaluated on several mainstream benchmarks, including LFW, SLLFW, CALFW, CPLFW, TALFW, CFP-FP, AGEDB and IJB-C databases. We demonstrate that Transformer models achieve comparable performance as CNN with similar number of parameters and MACs.

Usage Instructions

1. Preparation

The code is mainly adopted from Vision Transformer, and DeiT. In addition to PyTorch and torchvision, install vit_pytorch by Phil Wang, and package timm==0.3.2 by Ross Wightman. Sincerely appreciate for their contributions.

pip install vit-pytorch

pip install timm==0.3.2

Copy the files of fold "copy-to-vit_pytorch-path" to vit-pytorch path.

.
├── __init__.py
├── vit_face.py
└── vits_face.py

2. Databases

You can download the training databases, MS-Celeb-1M (version ms1m-retinaface), and put it in folder 'Data'.

You can download the testing databases as follows and put them in folder 'eval'.

LFW: Baidu Netdisk(password: dfj0), Google Drive
SLLFW: Baidu Netdisk(password: l1z6), Google Drive
CALFW: Baidu Netdisk(password: vvqe), Google Drive
CPLFW: Baidu Netdisk(password: jyp9), Google Drive
TALFW: Baidu Netdisk(password: izrg), Google Drive
CFP_FP: Baidu Netdisk(password: 4fem), Google Drive--refer to Insightface
AGEDB: Baidu Netdisk(password: rlqf), Google Drive--refer to Insightface

3. Train Models

ViT-P8S8

CUDA_VISIBLE_DEVICES='0,1,2,3' python3 -u train.py -b 480 -w 0,1,2,3 -d retina -n VIT -head CosFace --outdir ./results/ViT-P8S8_ms1m_cosface_s1 --warmup-epochs 1 --lr 3e-4 

CUDA_VISIBLE_DEVICES='0,1,2,3' python3 -u train.py -b 480 -w 0,1,2,3 -d retina -n VIT -head CosFace --outdir ./results/ViT-P8S8_ms1m_cosface_s2 --warmup-epochs 0 --lr 1e-4 -r path_to_model 

CUDA_VISIBLE_DEVICES='0,1,2,3' python3 -u train.py -b 480 -w 0,1,2,3 -d retina -n VIT -head CosFace --outdir ./results/ViT-P8S8_ms1m_cosface_s3 --warmup-epochs 0 --lr 5e-5 -r path_to_model

ViT-P12S8

CUDA_VISIBLE_DEVICES='0,1,2,3' python3 -u train.py -b 480 -w 0,1,2,3 -d retina -n VITs -head CosFace --outdir ./results/ViT-P12S8_ms1m_cosface_s1 --warmup-epochs 1 --lr 3e-4 

CUDA_VISIBLE_DEVICES='0,1,2,3' python3 -u train.py -b 480 -w 0,1,2,3 -d retina -n VITs -head CosFace --outdir ./results/ViT-P12S8_ms1m_cosface_s2 --warmup-epochs 0 --lr 1e-4 -r path_to_model 

CUDA_VISIBLE_DEVICES='0,1,2,3' python3 -u train.py -b 480 -w 0,1,2,3 -d retina -n VITs -head CosFace --outdir ./results/ViT-P12S8_ms1m_cosface_s3 --warmup-epochs 0 --lr 5e-5 -r path_to_model

4. Pretrained Models and Test Models (on LFW, SLLFW, CALFW, CPLFW, TALFW, CFP_FP, AGEDB)

You can download the following models

ViT-P8S8: Baidu Netdisk(password: spkf), Google Drive
ViT-P12S8: Baidu Netdisk(password: 7caa), Google Drive

You can test Models

python test.py --model ./results/ViT-P12S8_ms1m_cosface/Backbone_VITs_Epoch_2_Batch_12000_Time_2021-03-17-04-05_checkpoint.pth --network VIT 

python test.py --model ./results/ViT-P12S8_ms1m_cosface/Backbone_VITs_Epoch_2_Batch_12000_Time_2021-03-17-04-05_checkpoint.pth --network VITs

Face Transformer for Recognition

Related tags

Overview

Face-Transformer

Usage Instructions

1. Preparation

2. Databases

3. Train Models

4. Pretrained Models and Test Models (on LFW, SLLFW, CALFW, CPLFW, TALFW, CFP_FP, AGEDB)

Owner

Zhong Yaoyao

Pytorch Implementation of Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations

Official PyTorch implementation of the paper "Likelihood Training of Schrödinger Bridge using Forward-Backward SDEs Theory (SB-FBSDE)"

Deep Learning Visuals contains 215 unique images divided in 23 categories

Multi-Person Extreme Motion Prediction

Learning to Identify Top Elo Ratings with A Dueling Bandits Approach

Code release for SLIP Self-supervision meets Language-Image Pre-training

✔️ Visual, reactive testing library for Julia. Time machine included.

Face Transformer for Recognition

CVPR 2021: "The Spatially-Correlative Loss for Various Image Translation Tasks"

Learning Chinese Character style with conditional GAN

Implementation of a Transformer using ReLA (Rectified Linear Attention)

LeViT a Vision Transformer in ConvNet's Clothing for Faster Inference

A deep-learning pipeline for segmentation of ambiguous microscopic images.

The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

Code for the paper: Sketch Your Own GAN

This repository collects 100 papers related to negative sampling methods.

Code for Temporally Abstract Partial Models

Model Zoo for MindSpore

The official implementation of the CVPR2021 paper: Decoupled Dynamic Filter Networks

Data loaders and abstractions for text and NLP