The Submission for SIMMC 2.0 Challenge 2021

Related tags

Deep Learningsimmc2.0
Overview

The Submission for SIMMC 2.0 Challenge 2021

Requirements

Preprocessing

  1. Download Data
  • Download the data provided by the challenge organizer and put it in the data folder.
  • Unzip data files
  1. Image saving
  • Preprocess the image files in advance. The preprocessed result has the image name as the key and visual as the value.
python3 image_preprocessor.py
python3 image_preprocessor_final.py

Step 1 (ITM)

First, the model is post-trained by image-to-text matching. Here, image is each object and text is the visual metadata of the object. Code is provided in the ITM folder.

Step 2 (BTM)

Second, pretraining is performed to use background reprsentation of image in subtasks. Similar to ITM, it is trained to match image and text, and the image is the background of the dialog and the text is the entire context of the dialog. Code is provided in the BTM folder.

Step 3

This is the learning process for each subtask. You can train the model in each folder (sub1, sub2_1, sub2_2, sub2_3, sub2_4, sub4).

Model

All models can be downloaded from the following link

model.pt is a model for evaluating devtest, and the result is saved in the dstc10-simmc-entry folder. model_final.pt is a model for evaluating teststd, and the result is saved in the dstc10-simmc-final-entry folder. However, the training of the model was not completed within the challenge period, so we inferred to model.pt for the teststd data in subtask2.

Evlauation

Using the evaluation script suggested by the challenge organizer

The SIMMC organizers introduce the scripts:

(line-by-line evaluation) $ python -m gpt2_dst.scripts.evaluate \ --input_path_target={PATH_TO_GROUNDTRUTH_TARGET} \ --input_path_predicted={PATH_TO_MODEL_PREDICTIONS} \ --output_path_report={PATH_TO_REPORT} (Or, dialog level evaluation) $ python -m utils.evaluate_dst \ --input_path_target={PATH_TO_GROUNDTRUTH_TARGET} \ --input_path_predicted={PATH_TO_MODEL_PREDICTIONS} \ --output_path_report={PATH_TO_REPORT} $ python tools/response_evaluation.py \ --data_json_path={PATH_TO_GOLD_RESPONSES} \ --model_response_path={PATH_TO_MODEL_RESPONSES} \ --single_round_evaluation $ python tools/retrieval_evaluation.py \ --retrieval_json_path={PATH_TO_GROUNDTRUTH_RETRIEVAL} \ --model_score_path={PATH_TO_MODEL_CANDIDATE_SCORES} \ --single_round_evaluation ">

     
      
$ python tools/disambiguator_evaluation.py \
	--pred_file="{PATH_TO_PRED_FILE}" \
	--test_file="{PATH_TO_TEST_FILE}" \


      
       
(line-by-line evaluation)
$ python -m gpt2_dst.scripts.evaluate \
  --input_path_target={PATH_TO_GROUNDTRUTH_TARGET} \
  --input_path_predicted={PATH_TO_MODEL_PREDICTIONS} \
  --output_path_report={PATH_TO_REPORT}

(Or, dialog level evaluation)
$ python -m utils.evaluate_dst \
    --input_path_target={PATH_TO_GROUNDTRUTH_TARGET} \
    --input_path_predicted={PATH_TO_MODEL_PREDICTIONS} \
    --output_path_report={PATH_TO_REPORT}
    

       
        
$ python tools/response_evaluation.py \
    --data_json_path={PATH_TO_GOLD_RESPONSES} \
    --model_response_path={PATH_TO_MODEL_RESPONSES} \
    --single_round_evaluation


        
         
$ python tools/retrieval_evaluation.py \
    --retrieval_json_path={PATH_TO_GROUNDTRUTH_RETRIEVAL} \
    --model_score_path={PATH_TO_MODEL_CANDIDATE_SCORES} \
    --single_round_evaluation    

        
       
      
     

DevTest Results

Subtask #1: Multimodal Disambiguation

Test Method Accuracy
GPT2 from CO(Challenge Organizer) 73.9
Ours 92.28

Subtask #2: Multimodal Coreference Resolution

Test Method Object F1
GPT2 from CO 0.366
Ours-1 (sub2_1) 0.595
Ours-2 (sub2_2) 0.604
Ours-3 (sub2_3) 0.607
Ours-4 (sub2_4) 0.608

Subtask #3: Multimodal Dialog State Tracking

No Training/Testing

Subtask #4: Multimodal Dialog Response Generation

Generation

Baseline BLEU
GPT2 from CO 0.192
MTN-SIMMC2 from CO 0.217
Ours 0.285

Retrieval

No Training/Testing

AI-based, context-driven network device ranking

Batea A batea is a large shallow pan of wood or iron traditionally used by gold prospectors for washing sand and gravel to recover gold nuggets. Batea

Secureworks Taegis VDR 269 Nov 26, 2022
Neural HMMs are all you need (for high-quality attention-free TTS)

Neural HMMs are all you need (for high-quality attention-free TTS) Shivam Mehta, Éva Székely, Jonas Beskow, and Gustav Eje Henter This is the official

Shivam Mehta 0 Oct 28, 2022
Code release for Convolutional Two-Stream Network Fusion for Video Action Recognition

Convolutional Two-Stream Network Fusion for Video Action Recognition

Christoph Feichtenhofer 676 Dec 31, 2022
Yet Another Robotics and Reinforcement (YARR) learning framework for PyTorch.

Yet Another Robotics and Reinforcement (YARR) learning framework for PyTorch.

Stephen James 51 Dec 27, 2022
StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation

StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation Demo video: CVPR 2021 Oral: Single Channel Manipulation: Localized or attribu

Zongze Wu 267 Dec 30, 2022
PyTorch implementation of Constrained Policy Optimization

PyTorch implementation of Constrained Policy Optimization (CPO) This repository has a simple to understand and use implementation of CPO in PyTorch. A

Sapana Chaudhary 25 Dec 08, 2022
Adaout is a practical and flexible regularization method with high generalization and interpretability

Adaout Adaout is a practical and flexible regularization method with high generalization and interpretability. Requirements python 3.6 (Anaconda versi

lambett 1 Feb 09, 2022
OCR Post Correction for Endangered Language Texts

📌 Coming soon: an update to the software including features from our paper on semi-supervised OCR post-correction, to be published in the Transaction

Shruti Rijhwani 96 Dec 31, 2022
PyTorch Implementation for "ForkGAN with SIngle Rainy NIght Images: Leveraging the RumiGAN to See into the Rainy Night"

ForkGAN with Single Rainy Night Images: Leveraging the RumiGAN to See into the Rainy Night By Seri Lee, Department of Engineering, Seoul National Univ

Seri Lee 52 Oct 12, 2022
Notebooks, slides and dataset of the CorrelAid Machine Learning Winter School

CorrelAid Machine Learning Winter School Welcome to the CorrelAid ML Winter School! Task The problem we want to solve is to classify trees in Roosevel

CorrelAid 12 Nov 23, 2022
A large dataset of 100k Google Satellite and matching Map images, resembling pix2pix's Google Maps dataset.

Larger Google Sat2Map dataset This dataset extends the aerial ⟷ Maps dataset used in pix2pix (Isola et al., CVPR17). The provide script download_sat2m

34 Dec 28, 2022
Official implementation of Sparse Transformer-based Action Recognition

STAR Official implementation of S parse T ransformer-based A ction R ecognition Dataset download NTU RGB+D 60 action recognition of 2D/3D skeleton fro

Chonghan_Lee 15 Nov 02, 2022
TensorFlow implementation of ENet, trained on the Cityscapes dataset.

segmentation TensorFlow implementation of ENet (https://arxiv.org/pdf/1606.02147.pdf) based on the official Torch implementation (https://github.com/e

Fredrik Gustafsson 248 Dec 16, 2022
Hamiltonian Dynamics with Non-Newtonian Momentum for Rapid Sampling

Hamiltonian Dynamics with Non-Newtonian Momentum for Rapid Sampling Code for the paper: Greg Ver Steeg and Aram Galstyan. "Hamiltonian Dynamics with N

Greg Ver Steeg 25 Mar 14, 2022
Ensembling Off-the-shelf Models for GAN Training

Data-Efficient GANs with DiffAugment project | paper | datasets | video | slides Generated using only 100 images of Obama, grumpy cats, pandas, the Br

MIT HAN Lab 1.2k Dec 26, 2022
GazeScroller - Using Facial Movements to perform Hands-free Gesture on the system

GazeScroller Using Facial Movements to perform Hands-free Gesture on the system

2 Jan 05, 2022
SnapMix: Semantically Proportional Mixing for Augmenting Fine-grained Data (AAAI 2021)

SnapMix: Semantically Proportional Mixing for Augmenting Fine-grained Data (AAAI 2021) PyTorch implementation of SnapMix | paper Method Overview Cite

DavidHuang 126 Dec 30, 2022
HNECV: Heterogeneous Network Embedding via Cloud model and Variational inference

HNECV This repository provides a reference implementation of HNECV as described in the paper: HNECV: Heterogeneous Network Embedding via Cloud model a

4 Jun 28, 2022
Adversarial Reweighting for Partial Domain Adaptation

Adversarial Reweighting for Partial Domain Adaptation Code for paper "Xiang Gu, Xi Yu, Yan Yang, Jian Sun, Zongben Xu, Adversarial Reweighting for Par

12 Dec 01, 2022
Python scripts for performing lane detection using the LSTR model in ONNX

ONNX LSTR Lane Detection Python scripts for performing lane detection using the Lane Shape Prediction with Transformers (LSTR) model in ONNX. Requirem

Ibai Gorordo 29 Aug 30, 2022