Distributed Arcface Training in Pytorch

Related tags

Deep LearningMaske_FR
Overview

Distributed Arcface Training in Pytorch

This is a deep learning library that makes face recognition efficient, and effective, which can train tens of millions identity on a single server.

Requirements

How to Training

To train a model, run train.py with the path to the configs:

1. Single node, 8 GPUs:

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --node_rank=0 --master_addr="127.0.0.1" --master_port=1234 train.py configs/ms1mv3_r50

2. Multiple nodes, each node 8 GPUs:

Node 0:

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=2 --node_rank=0 --master_addr="ip1" --master_port=1234 train.py train.py configs/ms1mv3_r50

Node 1:

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=2 --node_rank=1 --master_addr="ip1" --master_port=1234 train.py train.py configs/ms1mv3_r50

3.Training resnet2060 with 8 GPUs:

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --node_rank=0 --master_addr="127.0.0.1" --master_port=1234 train.py configs/ms1mv3_r2060.py

Model Zoo

  • The models are available for non-commercial research purposes only.
  • All models can be found in here.
  • Baidu Yun Pan: e8pw
  • onedrive

Performance on ICCV2021-MFR

ICCV2021-MFR testset consists of non-celebrities so we can ensure that it has very few overlap with public available face recognition training set, such as MS1M and CASIA as they mostly collected from online celebrities. As the result, we can evaluate the FAIR performance for different algorithms.

For ICCV2021-MFR-ALL set, TAR is measured on all-to-all 1:1 protocal, with FAR less than 0.000001(e-6). The globalised multi-racial testset contains 242,143 identities and 1,624,305 images.

For ICCV2021-MFR-MASK set, TAR is measured on mask-to-nonmask 1:1 protocal, with FAR less than 0.0001(e-4). Mask testset contains 6,964 identities, 6,964 masked images and 13,928 non-masked images. There are totally 13,928 positive pairs and 96,983,824 negative pairs.

Datasets backbone Training throughout Size / MB ICCV2021-MFR-MASK ICCV2021-MFR-ALL
MS1MV3 r18 - 91 47.85 68.33
Glint360k r18 8536 91 53.32 72.07
MS1MV3 r34 - 130 58.72 77.36
Glint360k r34 6344 130 65.10 83.02
MS1MV3 r50 5500 166 63.85 80.53
Glint360k r50 5136 166 70.23 87.08
MS1MV3 r100 - 248 69.09 84.31
Glint360k r100 3332 248 75.57 90.66
MS1MV3 mobilefacenet 12185 7.8 41.52 65.26
Glint360k mobilefacenet 11197 7.8 44.52 66.48

Performance on IJB-C and Verification Datasets

Datasets backbone IJBC(1e-05) IJBC(1e-04) agedb30 cfp_fp lfw log
MS1MV3 r18 92.07 94.66 97.77 97.73 99.77 log
MS1MV3 r34 94.10 95.90 98.10 98.67 99.80 log
MS1MV3 r50 94.79 96.46 98.35 98.96 99.83 log
MS1MV3 r100 95.31 96.81 98.48 99.06 99.85 log
MS1MV3 r2060 95.34 97.11 98.67 99.24 99.87 log
Glint360k r18-0.1 93.16 95.33 97.72 97.73 99.77 log
Glint360k r34-0.1 95.16 96.56 98.33 98.78 99.82 log
Glint360k r50-0.1 95.61 96.97 98.38 99.20 99.83 log
Glint360k r100-0.1 95.88 97.32 98.48 99.29 99.82 log

Speed Benchmark

Arcface Torch can train large-scale face recognition training set efficiently and quickly. When the number of classes in training sets is greater than 300K and the training is sufficient, partial fc sampling strategy will get same accuracy with several times faster training performance and smaller GPU memory. Partial FC is a sparse variant of the model parallel architecture for large sacle face recognition. Partial FC use a sparse softmax, where each batch dynamicly sample a subset of class centers for training. In each iteration, only a sparse part of the parameters will be updated, which can reduce a lot of GPU memory and calculations. With Partial FC, we can scale trainset of 29 millions identities, the largest to date. Partial FC also supports multi-machine distributed training and mixed precision training.

Image text

More details see speed_benchmark.md in docs.

1. Training speed of different parallel methods (samples / second), Tesla V100 32GB * 8. (Larger is better)

- means training failed because of gpu memory limitations.

Number of Identities in Dataset Data Parallel Model Parallel Partial FC 0.1
125000 4681 4824 5004
1400000 1672 3043 4738
5500000 - 1389 3975
8000000 - - 3565
16000000 - - 2679
29000000 - - 1855

2. GPU memory cost of different parallel methods (MB per GPU), Tesla V100 32GB * 8. (Smaller is better)

Number of Identities in Dataset Data Parallel Model Parallel Partial FC 0.1
125000 7358 5306 4868
1400000 32252 11178 6056
5500000 - 32188 9854
8000000 - - 12310
16000000 - - 19950
29000000 - - 32324

Evaluation ICCV2021-MFR and IJB-C

More details see eval.md in docs.

Test

We tested many versions of PyTorch. Please create an issue if you are having trouble.

  • torch 1.6.0
  • torch 1.7.1
  • torch 1.8.0
  • torch 1.9.0

Citation

@inproceedings{deng2019arcface,
  title={Arcface: Additive angular margin loss for deep face recognition},
  author={Deng, Jiankang and Guo, Jia and Xue, Niannan and Zafeiriou, Stefanos},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={4690--4699},
  year={2019}
}
@inproceedings{an2020partical_fc,
  title={Partial FC: Training 10 Million Identities on a Single Machine},
  author={An, Xiang and Zhu, Xuhan and Xiao, Yang and Wu, Lan and Zhang, Ming and Gao, Yuan and Qin, Bin and
  Zhang, Debing and Fu Ying},
  booktitle={Arxiv 2010.05222},
  year={2020}
}
Implementing DeepMind's Fast Reinforcement Learning paper

Fast Reinforcement Learning This is a repo where I implement the algorithms in the paper, Fast reinforcement learning with generalized policy updates.

Marcus Chiam 6 Nov 28, 2022
DANA paper supplementary materials

DANA Supplements This repository stores the data, results, and R scripts to generate these reuslts and figures for the corresponding paper Depth Norma

0 Dec 17, 2021
Keyword spotting on Arm Cortex-M Microcontrollers

Keyword spotting for Microcontrollers This repository consists of the tensorflow models and training scripts used in the paper: Hello Edge: Keyword sp

Arm Software 1k Dec 30, 2022
PyTorch implementation of federated learning framework based on the acceleration of global momentum

Federated Learning with Acceleration of Global Momentum PyTorch implementation of federated learning framework based on the acceleration of global mom

0 Dec 23, 2021
GANTheftAuto is a fork of the Nvidia's GameGAN

Description GANTheftAuto is a fork of the Nvidia's GameGAN, which is research focused on emulating dynamic game environments. The early research done

Harrison 801 Dec 27, 2022
Simple torch.nn.module implementation of Alias-Free-GAN style filter and resample

Alias-Free-Torch Simple torch module implementation of Alias-Free GAN. This repository including Alias-Free GAN style lowpass sinc filter @filter.py A

이준혁(Junhyeok Lee) 64 Dec 22, 2022
CVPR2022 (Oral) - Rethinking Semantic Segmentation: A Prototype View

Rethinking Semantic Segmentation: A Prototype View Rethinking Semantic Segmentation: A Prototype View, Tianfei Zhou, Wenguan Wang, Ender Konukoglu and

Tianfei Zhou 239 Dec 26, 2022
A collection of resources, problems, explanations and concepts that are/were important during my Data Science journey

Data Science Gurukul List of resources, interview questions, concepts I use for my Data Science work. Topics: Basics of Programming with Python + Unde

Smaranjit Ghose 10 Oct 25, 2022
JstDoS - HTTP Protocol Stack Remote Code Execution Vulnerability

jstDoS If you are going to skid that, please give credits ! ^^ ¿How works? This

apolo 4 Feb 11, 2022
GANimation: Anatomically-aware Facial Animation from a Single Image (ECCV'18 Oral) [PyTorch]

GANimation: Anatomically-aware Facial Animation from a Single Image [Project] [Paper] Official implementation of GANimation. In this work we introduce

Albert Pumarola 1.8k Dec 28, 2022
Implementation of the HMAX model of vision in PyTorch

PyTorch implementation of HMAX PyTorch implementation of the HMAX model that closely follows that of the MATLAB implementation of The Laboratory for C

Marijn van Vliet 52 Oct 13, 2022
Official implementation of the ICCV 2021 paper "Conditional DETR for Fast Training Convergence".

The DETR approach applies the transformer encoder and decoder architecture to object detection and achieves promising performance. In this paper, we handle the critical issue, slow training convergen

281 Dec 30, 2022
Artificial Intelligence search algorithm base on Pacman

Pacman Search Artificial Intelligence search algorithm base on Pacman Source The Pacman Projects by the University of California, Berkeley. Layouts Di

Day Fundora 6 Nov 17, 2022
Embeddinghub is a database built for machine learning embeddings.

Embeddinghub is a database built for machine learning embeddings.

Featureform 1.2k Jan 01, 2023
🔥3D-RecGAN in Tensorflow (ICCV Workshops 2017)

3D Object Reconstruction from a Single Depth View with Adversarial Learning Bo Yang, Hongkai Wen, Sen Wang, Ronald Clark, Andrew Markham, Niki Trigoni

Bo Yang 125 Nov 26, 2022
Cross-lingual Transfer for Speech Processing using Acoustic Language Similarity

Cross-lingual Transfer for Speech Processing using Acoustic Language Similarity Indic TTS Samples can be found at https://peter-yh-wu.github.io/cross-

Peter Wu 1 Nov 12, 2022
Pytorch Performace Tuning, WandB, AMP, Multi-GPU, TensorRT, Triton

Plant Pathology 2020 FGVC7 Introduction A deep learning model pipeline for training, experimentaiton and deployment for the Kaggle Competition, Plant

Bharat Giddwani 0 Feb 25, 2022
Code for our EMNLP 2021 paper “Heterogeneous Graph Neural Networks for Keyphrase Generation”

GATER This repository contains the code for our EMNLP 2021 paper “Heterogeneous Graph Neural Networks for Keyphrase Generation”. Our implementation is

Jiacheng Ye 12 Nov 24, 2022
An MQA (Studio, originalSampleRate) identifier for lossless flac files written in Python.

An MQA (Studio, originalSampleRate) identifier for "lossless" flac files written in Python.

Daniel 10 Oct 03, 2022
Joint Discriminative and Generative Learning for Person Re-identification. CVPR'19 (Oral)

Joint Discriminative and Generative Learning for Person Re-identification [Project] [Paper] [YouTube] [Bilibili] [Poster] [Supp] Joint Discriminative

NVIDIA Research Projects 1.2k Dec 30, 2022