Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps[AAAI2021]

Overview

Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps

Here is the code for ssbassline model. We also provide OCR results/features/models. The code is built on top of M4C, where more detailed information can also be found.

Citation

If you use ssbaseline in your work, please cite:

@article{zhu2020simple,
  title={Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps},
  author={Zhu, Qi and Gao, Chenyu and Wang, Peng and Wu, Qi},
  journal={arXiv preprint arXiv:2012.05153},
  year={2020}
}

Installation

First install the repo using

git clone https://github.com/ZephyrZhuQi/ssbaseline.git ~/ssbaseline
cd ~/ssbaseline
python setup.py build develop

Getting Data

We provide SBD-Trans OCR for TextVQA and ST-VQA datasets. The corresponding OCR Faster R-CNN features and Recog-CNN features are also released.

Datasets ImDBs Object Faster R-CNN Features OCR Faster R-CNN Features OCR Recog-CNN Features
TextVQA TextVQA ImDB Open Images TextVQA SBD-Trans OCRs TextVQA SBD-Trans OCRs
ST-VQA ST-VQA ImDB ST-VQA Objects ST-VQA SBD-Trans OCRs ST-VQA SBD-Trans OCRs

Pretrained Models

We release the following pretrained models for ssbaseline on TextVQA.

For the TextVQA dataset, we release: ssbaseline trained with ST-VQA as additional data (our best model) with SBD-Trans.

Datasets Config Files (under configs/vqa/) Pretrained Models Metrics Notes
TextVQA (m4c_textvqa) m4c_textvqa/m4c_with_stvqa.yml ssbaseline_with_stvqa val accuracy - 45.53%; test accuracy - 45.66% SBD-Trans OCRs; ST-VQA as additional data

Training and Evaluation

Please follow the M4C README for the training and evaluation of the M4C model on each dataset.

Owner
ZephyrZhuQi
Visual and linguistic reasoning.
ZephyrZhuQi
Python program that works as a contact list

Lista de Contatos Programa em Python que funciona como uma lista de contatos. Features Adicionar novo contato Remover contato Atualizar contato Pesqui

Victor B. Lino 3 Dec 16, 2021
Integrated physics-based and ligand-based modeling.

ComBind ComBind integrates data-driven modeling and physics-based docking for improved binding pose prediction and binding affinity prediction. Given

Dror Lab 44 Oct 26, 2022
Create Data & AI apps in 20 lines of code with Shimoku

Install with: pip install shimoku-api-python Start with: from os import getenv import shimoku_api_python.client as Shimoku

Shimoku 5 Nov 07, 2022
Local trajectory planner based on a multilayer graph framework for autonomous race vehicles.

Graph-Based Local Trajectory Planner The graph-based local trajectory planner is python-based and comes with open interfaces as well as debug, visuali

TUM - Institute of Automotive Technology 160 Jan 04, 2023
Making self-supervised learning work on molecules by using their 3D geometry to pre-train GNNs. Implemented in DGL and Pytorch Geometric.

3D Infomax improves GNNs for Molecular Property Prediction Video | Paper We pre-train GNNs to understand the geometry of molecules given only their 2D

Hannes Stärk 95 Dec 30, 2022
MEDS: Enhancing Memory Error Detection for Large-Scale Applications

MEDS: Enhancing Memory Error Detection for Large-Scale Applications Prerequisites cmake and clang Build MEDS supporting compiler $ make Build Using Do

Secomp Lab at Purdue University 34 Dec 14, 2022
Lighting the Darkness in the Deep Learning Era: A Survey, An Online Platform, A New Dataset

Lighting the Darkness in the Deep Learning Era: A Survey, An Online Platform, A New Dataset This repository provides a unified online platform, LoLi-P

Chongyi Li 457 Jan 03, 2023
"SinNeRF: Training Neural Radiance Fields on Complex Scenes from a Single Image", Dejia Xu, Yifan Jiang, Peihao Wang, Zhiwen Fan, Humphrey Shi, Zhangyang Wang

SinNeRF: Training Neural Radiance Fields on Complex Scenes from a Single Image [Paper] [Website] Pipeline Code Environment pip install -r requirements

VITA 250 Jan 05, 2023
TICC is a python solver for efficiently segmenting and clustering a multivariate time series

TICC TICC is a python solver for efficiently segmenting and clustering a multivariate time series. It takes as input a T-by-n data matrix, a regulariz

406 Dec 12, 2022
Source code for our paper "Do Not Trust Prediction Scores for Membership Inference Attacks"

Do Not Trust Prediction Scores for Membership Inference Attacks Abstract: Membership inference attacks (MIAs) aim to determine whether a specific samp

<a href=[email protected]"> 3 Oct 25, 2022
ISBI 2022: Cross-level Contrastive Learning and Consistency Constraint for Semi-supervised Medical Image.

Cross-level Contrastive Learning and Consistency Constraint for Semi-supervised Medical Image Introduction This repository contains the PyTorch implem

25 Nov 09, 2022
DeepFashion2 is a comprehensive fashion dataset.

DeepFashion2 Dataset DeepFashion2 is a comprehensive fashion dataset. It contains 491K diverse images of 13 popular clothing categories from both comm

switchnorm 1.8k Jan 07, 2023
Source code for our Paper "Learning in High-Dimensional Feature Spaces Using ANOVA-Based Matrix-Vector Multiplication"

NFFT4ANOVA Source code for our Paper "Learning in High-Dimensional Feature Spaces Using ANOVA-Based Matrix-Vector Multiplication" This package uses th

Theresa Wagner 1 Aug 10, 2022
The most simple and minimalistic navigation dashboard.

Navigation This project follows a goal to have simple and lightweight dashboard with different links. I use it to have my own self-hosted service dash

Yaroslav 23 Dec 23, 2022
A "gym" style toolkit for building lightweight Neural Architecture Search systems

A "gym" style toolkit for building lightweight Neural Architecture Search systems

Jack Turner 12 Nov 05, 2022
Image Super-Resolution by Neural Texture Transfer

SRNTT: Image Super-Resolution by Neural Texture Transfer Tensorflow implementation of the paper Image Super-Resolution by Neural Texture Transfer acce

Zhifei Zhang 413 Nov 30, 2022
Code repo for "Cross-Scale Internal Graph Neural Network for Image Super-Resolution" (NeurIPS'20)

IGNN Code repo for "Cross-Scale Internal Graph Neural Network for Image Super-Resolution" [paper] [supp] Prepare datasets 1 Download training dataset

Shangchen Zhou 278 Jan 03, 2023
Implementation of "Bidirectional Projection Network for Cross Dimension Scene Understanding" CVPR 2021 (Oral)

Bidirectional Projection Network for Cross Dimension Scene Understanding CVPR 2021 (Oral) [ Project Webpage ] [ arXiv ] [ Video ] Existing segmentatio

Hu Wenbo 135 Dec 26, 2022
Official PyTorch implementation of UACANet: Uncertainty Aware Context Attention for Polyp Segmentation

UACANet: Uncertainty Aware Context Attention for Polyp Segmentation Official pytorch implementation of UACANet: Uncertainty Aware Context Attention fo

Taehun Kim 85 Dec 14, 2022
This repo holds code for TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

TransUNet This repo holds code for TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation Usage

1.4k Jan 04, 2023