Pyramid Pooling Transformer for Scene Understanding

Last update: Dec 29, 2022

Related tags

Deep Learning P2T

Overview

Pyramid Pooling Transformer for Scene Understanding

Requirements:

torch 1.6+
torchvision 0.7.0
timm==0.3.2
Validated on torch 1.6.0, torchvision 0.7.0

Models Pretrained on ImageNet1K

Variants	Input Size	[email protected]	[email protected]	#Params (M)	Pretrained Models
P2T-Tiny	224 x 224	78.1	94.1	11.1	Google Drive
P2T-Small	224 x 224	82.1	95.9	23.0	Google Drive
P2T-Base	224 x 224	83.0	96.2	36.2	Google Drive

Pretrained Models for Downstream tasks

To be updated.

Something Else

Note: we have prepared a stronger version of P2T. Since P2T is still in peer review, we will release the stronger P2T after the acceptance.

You might also like...

Neural Scene Graphs for Dynamic Scene (CVPR 2021)

Implementation of Neural Scene Graphs, that optimizes multiple radiance fields to represent different objects and a static scene background. Learned representations can be rendered with novel object compositions and views.

151 Dec 26, 2022

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

README.md shall be finished soon. WSSGG 0 Overview 1 Installation 1.1 Faster-RCNN 1.2 Language Parser 1.3 GloVe Embeddings 2 Settings 2.1 VG-GT-Graph

35 Nov 20, 2022

Automatic number plate recognition using tech: Yolo, OCR, Scene text detection, scene text recognation, flask, torch

Automatic Number Plate Recognition Automatic Number Plate Recognition (ANPR) is the process of reading the characters on the plate with various optica

52 Dec 22, 2022

Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Make-A-Scene - PyTorch Pytorch implementation (inofficial) of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors (https://arxiv.org/

259 Dec 28, 2022

Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021

Learning the Best Pooling Strategy for Visual Semantic Embedding Official PyTorch implementation of the paper Learning the Best Pooling Strategy for V

106 Jan 6, 2023

Source code for paper "Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling", AAAI 2021

ATLOP Code for AAAI 2021 paper Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling. If you make use of this co

146 Nov 29, 2022

This repository is an open-source implementation of the ICRA 2021 paper: Locus: LiDAR-based Place Recognition using Spatiotemporal Higher-Order Pooling.

Locus This repository is an open-source implementation of the ICRA 2021 paper: Locus: LiDAR-based Place Recognition using Spatiotemporal Higher-Order

96 Dec 15, 2022

Compact Bilinear Pooling for PyTorch

Compact Bilinear Pooling for PyTorch. This repository has a pure Python implementation of Compact Bilinear Pooling and Count Sketch for PyTorch. This

234 Dec 7, 2022

A Pytorch Implementation for Compact Bilinear Pooling.

CompactBilinearPooling-Pytorch A Pytorch Implementation for Compact Bilinear Pooling. Adapted from tensorflow_compact_bilinear_pooling Prerequisites I

169 Dec 23, 2022

Comments

How to load ImageNet1K pretrained weight to semantic segmentation model?

Hello, thanks for open source!

I use mmseg, and load weight from image classification result, it warns: WARNING - The model and loaded state dict do not match exactly missing keys in source state_dict: backbone.head.weight, backbone.head.bias unexpected key in source state_dict: cls_token, ln1.bias, ln1.weight, layers.0.ln1.bias, layers.0.ln1.weight, layers.0.ln2.bias, layers.0.ln2.weight, layers.0.ffn.layers.0.0.bias, layers.0.ffn.layers.0.0.weight, layers.0.ffn.layers.1.bias, layers.0.ffn.layers.1.weight, layers.0.attn.attn.out_proj.bias, layers.0.attn.attn.out_proj.weight, layers.0.attn.attn.in_proj_bias, layers.0.attn.attn.in_proj_weight, layers.1.ln1.bias, layers.1.ln1.weight, layers.1.ln2.bias, layers.1.ln2.weight, layers.1.ffn.layers.0.0.bias, layers.1.ffn.layers.0.0.weight, layers.1.ffn.layers.1.bias, layers.1.ffn.layers.1.weight, layers.1.attn.attn.out_proj.bias, layers.1.attn.attn.out_proj.weight ...... And the experimental results are terrible as the experiments initialize weight with random.

So I load weight from ADE20K result, it work and warns: WARNING - The model and loaded state dict do not match exactly missing keys in source state_dict: backbone.head.weight, backbone.head.bias And the result is similar to the result you offer.

Which weight should I load? ImageNet1K or ADE20K? Or should I modify the keys of weight in ImageNet1K to adapt the key in segmentation?

opened by asd123pwj 8
Questions about your ablation studies

Hello,

I have some questions about your ablation studies of pyramid pooling. Could you detail about your baseline version in Table 9? First, you say that you replace P-MHSA with an MHSA with a single pooling operation, what is the detail about single pooling operation? Ex: Pooling Ratios? Second, do you compared your method with original MHSA?

opened by pp00704831 3
P2T replaces PVT trunk bug

When I replaced the PVT trunk with P2T in my code, I encountered an error ：
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [16, 512, 3, 3]], which is output 0 of AdaptiveAvgPool2DBackward, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

opened by liu-tianxiang 2
P2T on ImageNet-22K?

Hi @yuhuan-wu , thank you for share the code of this excellent work! Have you trained P2T on ImageNet-22K dataset or any further plan to do it? If so, could you please share the pretrained model on ImageNet-22k?

Thank you.

opened by fyaft2012 1

Releases(v1.0)

v1.0(Dec 1, 2022)

Source code(tar.gz)
Source code(zip)
p2t_base.pth(137.78 MB)
p2t_large.pth(208.09 MB)
p2t_small.pth(91.98 MB)
p2t_tiny.pth(44.31 MB)

Owner

Yu-Huan Wu

Ph.D. student at Nankai University

GitHub Repository

The Implicit Bias of Gradient Descent on Generalized Gated Linear Networks

The Implicit Bias of Gradient Descent on Generalized Gated Linear Networks This folder contains the code to reproduce the data in "The Implicit Bias o

0 Feb 05, 2022

Unofficial PyTorch implementation of the Adaptive Convolution architecture for image style transfer

AdaConv Unofficial PyTorch implementation of the Adaptive Convolution architecture for image style transfer from "Adaptive Convolutions for Structure-

65 Dec 22, 2022

Simplified interface for TensorFlow (mimicking Scikit Learn) for Deep Learning

SkFlow has been moved to Tensorflow. SkFlow has been moved to http://github.com/tensorflow/tensorflow into contrib folder specifically located here. T

3.2k Dec 29, 2022

Implementation for paper LadderNet: Multi-path networks based on U-Net for medical image segmentation

Implementation for paper LadderNet: Multi-path networks based on U-Net for medical image segmentation This implementation is based on orobix implement

116 Sep 06, 2022

An LSTM based GAN for Human motion synthesis

GAN-motion-Prediction An LSTM based GAN for motion synthesis has a few issues reading H3.6M data from A.Jain et al , will fix soon. Prediction of the

9 Jun 17, 2022

Serverless proxy for Spark cluster

Hydrosphere Mist Hydrosphere Mist is a serverless proxy for Spark cluster. Mist provides a new functional programming framework and deployment model f

317 Dec 01, 2022

Progressive Domain Adaptation for Object Detection

Progressive Domain Adaptation for Object Detection Implementation of our paper Progressive Domain Adaptation for Object Detection, based on pytorch-fa

96 Nov 25, 2022

A MatConvNet-based implementation of the Fully-Convolutional Networks for image segmentation

MatConvNet implementation of the FCN models for semantic segmentation This package contains an implementation of the FCN models (training and evaluati

175 Feb 18, 2022

Advanced Deep Learning with TensorFlow 2 and Keras (Updated for 2nd Edition)

1.5k Jan 03, 2023

Code release for NeRF (Neural Radiance Fields)

NeRF: Neural Radiance Fields Project Page | Video | Paper | Data Tensorflow implementation of optimizing a neural representation for a single scene an

6.5k Jan 01, 2023

PyTorch implementation of Constrained Policy Optimization

PyTorch implementation of Constrained Policy Optimization (CPO) This repository has a simple to understand and use implementation of CPO in PyTorch. A

25 Dec 08, 2022

Framework that uses artificial intelligence applied to mathematical models to make predictions

LiconIA Framework that uses artificial intelligence applied to mathematical models to make predictions Interface Overview Table of contents [TOC] 1 Ar

4 Jun 20, 2021

A Parameter-free Deep Embedded Clustering Method for Single-cell RNA-seq Data

A Parameter-free Deep Embedded Clustering Method for Single-cell RNA-seq Data Overview Clustering analysis is widely utilized in single-cell RNA-seque

3 May 08, 2022

Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation

GPT2-Pytorch with Text-Generator Better Language Models and Their Implications Our model, called GPT-2 (a successor to GPT), was trained simply to pre

775 Jan 08, 2023

a baseline to practice

ccks2021_track3_baseline a baseline to practice 路径可能会有问题，自己改改 torch==1.7.1 pyhton==3.7.1 transformers==4.7.0 cuda==11.0 this is a baseline, you can fi

45 Nov 23, 2022

Code accompanying the NeurIPS 2021 paper "Generating High-Quality Explanations for Navigation in Partially-Revealed Environments"

Generating High-Quality Explanations for Navigation in Partially-Revealed Environments This work presents an approach to explainable navigation under

1 Oct 28, 2022

Pyramid Pooling Transformer for Scene Understanding

Related tags

Overview

Pyramid Pooling Transformer for Scene Understanding

Models Pretrained on ImageNet1K

Pretrained Models for Downstream tasks

Something Else

You might also like...

Neural Scene Graphs for Dynamic Scene (CVPR 2021)

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

Automatic number plate recognition using tech: Yolo, OCR, Scene text detection, scene text recognation, flask, torch

Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021

Source code for paper "Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling", AAAI 2021

This repository is an open-source implementation of the ICRA 2021 paper: Locus: LiDAR-based Place Recognition using Spatiotemporal Higher-Order Pooling.

Compact Bilinear Pooling for PyTorch

A Pytorch Implementation for Compact Bilinear Pooling.

Comments

How to load ImageNet1K pretrained weight to semantic segmentation model?

Questions about your ablation studies

P2T replaces PVT trunk bug

P2T on ImageNet-22K?

Releases(v1.0)

v1.0(Dec 1, 2022)

Owner

Yu-Huan Wu

The Implicit Bias of Gradient Descent on Generalized Gated Linear Networks

Unofficial PyTorch implementation of the Adaptive Convolution architecture for image style transfer

Simplified interface for TensorFlow (mimicking Scikit Learn) for Deep Learning

Implementation for paper LadderNet: Multi-path networks based on U-Net for medical image segmentation

An LSTM based GAN for Human motion synthesis

Serverless proxy for Spark cluster

Progressive Domain Adaptation for Object Detection

A MatConvNet-based implementation of the Fully-Convolutional Networks for image segmentation

Advanced Deep Learning with TensorFlow 2 and Keras (Updated for 2nd Edition)

Code release for NeRF (Neural Radiance Fields)

PyTorch implementation of Constrained Policy Optimization

Framework that uses artificial intelligence applied to mathematical models to make predictions

A Parameter-free Deep Embedded Clustering Method for Single-cell RNA-seq Data

Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation

a baseline to practice

Code accompanying the NeurIPS 2021 paper "Generating High-Quality Explanations for Navigation in Partially-Revealed Environments"

Code and datasets for TPAMI 2021

A DeepStack custom model for detecting common objects in dark/night images and videos.

Code for Parameter Prediction for Unseen Deep Architectures (NeurIPS 2021)

The repository offers the official implementation of our BMVC 2021 paper in PyTorch.