We will release the code of "ConTNet: Why not use convolution and transformer at the same time?" in this repo

Last update: Nov 08, 2022

Related tags

Overview

ConTNet

Introduction

ConTNet (Convlution-Tranformer Network) is proposed mainly in response to the following two issues: (1) ConvNets lack a large receptive field, limiting the performance of ConvNets on downstream tasks. (2) Transformer-based model is not robust enough and requires special training settings or hundreds of millions of images as the pretrain dataset, thereby limiting their adoption. ConTNet combines convolution and transformer alternately, which is very robust and can be optimized like ResNet unlike the recently-proposed transformer-based models (e.g., ViT, DeiT) that are sensitive to hyper-parameters and need many tricks when trained from scratch on a midsize dataset (e.g., ImageNet).

Main Results on ImageNet

name	resolution	[email protected]	#params(M)	FLOPs(G)
Res-18	224x224	71.5	11.7	1.8
ConT-S	224x224	74.9	10.1	1.5
Res-50	224x224	77.1	25.6	4.0
ConT-M	224x224	77.6	19.2	3.1
Res-101	224x224	78.2	44.5	7.6
ConT-B	224x224	77.9	39.6	6.4
DeiT-Ti^*	224x224	72.2	5.7	1.3
ConT-Ti^*	224x224	74.9	5.8	0.8
Res-18^*	224x224	73.2	11.7	1.8
ConT-S^*	224x224	76.5	10.1	1.5
Res-50^*	224x224	78.6	25.6	4.0
DeiT-S^*	224x224	79.8	22.1	4.6
ConT-M^*	224x224	80.2	19.2	3.1
Res-101^*	224x224	80.0	44.5	7.6
DeiT-B^*	224x224	81.8	86.6	17.6
ConT-B^*	224x224	81.8	39.6	6.4

Note: ^* indicates training with strong augmentations.

Main Results on Downstream Tasks

Object detection results on COCO.

method	backbone	#params(M)	FLOPs(G)	AP	APs	APm	APl
RetinaNet	Res-50 ConTNet-M	32.0 27.0	235.6 217.2	36.5 37.9	20.4 23.0	40.3 40.6	48.1 50.4
FCOS	Res-50 ConTNet-M	32.2 27.2	242.9 228.4	38.7 40.8	22.9 25.1	42.5 44.6	50.1 53.0
faster rcnn	Res-50 ConTNet-M	41.5 36.6	241.0 225.6	37.4 40.0	21.2 25.4	41.0 43.0	48.1 52.0

Instance segmentation results on Cityscapes based on Mask-RCNN.

backbone	AP^bb	AP_s^bb	AP_m^bb	AP_l^bb	AP^mk	AP_s^mk	AP_m^mk	AP_l^mk
Res-50 ConT-M	38.2 40.5	21.9 25.1	40.9 44.4	49.5 52.7	34.7 38.1	18.3 20.9	37.4 41.0	47.2 50.3

Semantic segmentation results on cityscapes.

model	mIOU
PSP-Res50	77.12
PSP-ConTM	78.28

Bib Citing

@article{yan2021contnet,
    title={ConTNet: Why not use convolution and transformer at the same time?},
    author={Haotian Yan and Zhe Li and Weijian Li and Changhu Wang and Ming Wu and Chuang Zhang},
    year={2021},
    journal={arXiv preprint arXiv:2104.13497}
}

We will release the code of "ConTNet: Why not use convolution and transformer at the same time?" in this repo

Related tags

Overview

ConTNet

Introduction

Main Results on ImageNet

Main Results on Downstream Tasks

Bib Citing

Owner

Multi-Task Learning as a Bargaining Game

Reference models and tools for Cloud TPUs.

Annotated, understandable, and visually interpretable PyTorch implementations of: VAE, BIRVAE, NSGAN, MMGAN, WGAN, WGANGP, LSGAN, DRAGAN, BEGAN, RaGAN, InfoGAN, fGAN, FisherGAN

Codes for paper "Towards Diverse Paragraph Captioning for Untrimmed Videos". CVPR 2021

Active Offline Policy Selection With Python

[ICME 2021 Oral] CORE-Text: Improving Scene Text Detection with Contrastive Relational Reasoning

Official repository for Few-shot Image Generation via Cross-domain Correspondence (CVPR '21)

KUIELAB-MDX-Net got the 2nd place on the Leaderboard A and the 3rd place on the Leaderboard B in the MDX-Challenge ISMIR 2021

PyTorch Lightning + Hydra. A feature-rich template for rapid, scalable and reproducible ML experimentation with best practices. ⚡🔥⚡

Official code for "Focal Self-attention for Local-Global Interactions in Vision Transformers"

Visualizing lattice vibration information from phonon dispersion to atoms (For GPUMD)

A library for low-memory inferencing in PyTorch.

1st Solution For ICDAR 2021 Competition on Mathematical Formula Detection

Keyword2Text This repository contains the code of the paper: "A Plug-and-Play Method for Controlled Text Generation"

FaceAnon - Anonymize people in images and videos using yolov5-crowdhuman

DeepMReye: magnetic resonance-based eye tracking using deep neural networks

Contains source code for the winning solution of the xView3 challenge

These are the materials for the paper "Few-Shot Out-of-Domain Transfer Learning of Natural Language Explanations"

Huawei Hackathon 2021 - Sweden (Stockholm)

QR2Pass-project - A proof of concept for an alternative (passwordless) authentication system to a web server