Replication of Pix2Seq with Pretrained Model

Last update: Nov 22, 2022

Related tags

Overview

Pretrained-Pix2Seq

We provide the pre-trained model of Pix2Seq. This version contains new data augmentation. The model is trained for 300 epochs and can acheive 37 mAP without beam search or neucles search.

Installation

Install PyTorch 1.5+ and torchvision 0.6+ (recommend torch1.8.1 torchvision 0.8.0)

Install pycocotools (for evaluation on COCO):

pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

That's it, should be good to train and evaluate detection models.

Data preparation

Download and extract COCO 2017 train and val images with annotations from http://cocodataset.org. We expect the directory structure to be the following:

path/to/coco/
  annotations/  # annotation json files
  train2017/    # train images
  val2017/      # val images

Training

First link coco dataset to the project folder

ln -s /path/to/coco ./coco

Training

sh train.sh --model pix2seq --output_dir /path/to/save

Evaluation

sh train.sh --model pix2seq --output_dir /path/to/save --resume /path/to/checkpoints --eval

COCO

Method	backbone	Epoch	Batch Size	AP	AP50	AP75	Weights
Pix2Seq	R50	300	32	37.0	53.4	39.4	weight

Contributor

Qiu Han, Peng Gao, Jingqiu Zhou(Beam Search)

Acknowledegement

Pix2Seq, DETR

Replication of Pix2Seq with Pretrained Model

Related tags

Overview

Pretrained-Pix2Seq

Installation

Data preparation

Training

COCO

Contributor

Acknowledegement

Owner

peng gao

This repository is an implementation of paper : Improving the Training of Graph Neural Networks with Consistency Regularization

Implementation of Geometric Vector Perceptron, a simple circuit for 3d rotation equivariance for learning over large biomolecules, in Pytorch. Idea proposed and accepted at ICLR 2021

EDPN: Enhanced Deep Pyramid Network for Blurry Image Restoration

CSD: Consistency-based Semi-supervised learning for object Detection

Fuse radar and camera for detection

This is the repository for our paper Ditch the Gold Standard: Re-evaluating Conversational Question Answering

Hcaptcha-challenger - Gracefully face hCaptcha challenge with Yolov5(ONNX) embedded solution

This is an open source library implementing hyperbox-based machine learning algorithms

Improving 3D Object Detection with Channel-wise Transformer

A novel Engagement Detection with Multi-Task Training (ED-MTT) system

Codes for NeurIPS 2021 paper "On the Equivalence between Neural Network and Support Vector Machine".

Code image classification of MNIST dataset using different architectures: simple linear NN, autoencoder, and highway network

Bravia core script for python

TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction.

Codebase for INVASE: Instance-wise Variable Selection - 2019 ICLR

Highly comparative time-series analysis

GAN-based 3D human pose estimation model for 3DV'17 paper

Official Pytorch implementation of the paper "MotionCLIP: Exposing Human Motion Generation to CLIP Space"

an implementation of softmax splatting for differentiable forward warping using PyTorch

The official implementation of NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021]. https://arxiv.org/pdf/2101.12378.pdf