Cleaned up code for DSTC 10: SIMMC 2.0 track: subtask 2: multimodal coreference resolution

Last update: Dec 05, 2022

Related tags

Overview

UNITER-Based Situated Coreference Resolution with Rich Multimodal Input: arXiv

MMCoref_cleaned

Code for the MMCoref task of the SIMMC 2.0 dataset.
Pretrained vision-language models adapted from Transformers-VQA.
Zero-shot visual feature extraction using CLIP and BUTD.
Zero-shot non-visual prefab feature (flattened into strings) extraction using BERT and SBERT.

Dependencies

requirements.txt

Download the data and pretrained/trained model checkpoints

Data: Put the data in ./data. Unpack all image in ./data/all_images and all scene.jsons (including teststd split) in ./data/simmc2_scene_jsons_dstc10_public/public.
Pretrained models: Checkpoints in ./pretrained and ./model/Transformers-VQA-master/models/pretrained. Download links in placeholder.txt in these folders.
Trained models: Checkpints in ./trained. Download from ./trained/placeholder.txt

Preprocess

Convert json files ~~using ./scripts/converter.py~~ *Currently not working. (Someone managed to lose the latest converter.py.) Download the processed data instead.
Get BERT/SBERT embeddings of non-visual prefab features using ./scripts/{get_KB_embedding, get_KB_embedding_SBERT, get_KB_embedding_no_duplicate}.py
Get CLIP/BUTD embeddigns for images using scripts ./scripts/get-visual-features-{CLIP, RCNN}.ipynb
Or just download everything from ./processed/placeholder.txt

Train

Under ./sh/train. See the arguments for used input.

Inference and evaluate

Under ./sh/infer_eval (devtest split) and ./sh/infer_eval_dev (dev split)
Outputs at ./output (same format as the original dialogue json).
Logits at ./output/logit {dialogue_idx: {round_idx: [[logit, label], ...]}}
run ./scripts/output_filter_error.py to select and reformat error cases.

Ensemble

cd script python ensemble --method optuna

output saved to output/logit/blended_devtest.json

Cleaned up code for DSTC 10: SIMMC 2.0 track: subtask 2: multimodal coreference resolution

Related tags

Overview

UNITER-Based Situated Coreference Resolution with Rich Multimodal Input: arXiv

MMCoref_cleaned

Dependencies

Download the data and pretrained/trained model checkpoints

Preprocess

Train

Inference and evaluate

Ensemble

Owner

Yichen (William) Huang

Official source code to CVPR'20 paper, "When2com: Multi-Agent Perception via Communication Graph Grouping"

Offical code for the paper: "Growing 3D Artefacts and Functional Machines with Neural Cellular Automata" https://arxiv.org/abs/2103.08737

AFL binary instrumentation

Relative Uncertainty Learning for Facial Expression Recognition

🧑‍🔬 verify your TEAL program by experiment and observation

ReferFormer - Official Implementation of ReferFormer

A two-stage U-Net for high-fidelity denoising of historical recordings

Code for the paper "There is no Double-Descent in Random Forests"

This repository contains tutorials for the py4DSTEM Python package

Bolt Online Learning Toolbox

This repo holds the code of TransFuse: Fusing Transformers and CNNs for Medical Image Segmentation

This is a JAX implementation of Neural Radiance Fields for learning purposes.

This repository contains the implementation of the HealthGen model, a generative model to synthesize realistic EHR time series data with missingness

Axel - 3D printed robotic hands and they controll with Raspberry Pi and Arduino combo

SCU OlympicsRunning Baseline

PrimitiveNet: Primitive Instance Segmentation with Local Primitive Embedding under Adversarial Metric (ICCV 2021)

This is the face keypoint train code of project face-detection-project

Attention-driven Robot Manipulation (ARM) which includes Q-attention

Ultra-Data-Efficient GAN Training: Drawing A Lottery Ticket First, Then Training It Toughly

[ICLR'21] FedBN: Federated Learning on Non-IID Features via Local Batch Normalization