The InterScript dataset contains interactive user feedback on scripts generated by a T5-XXL model.

Last update: Dec 01, 2022

Related tags

Deep Learning interscript

Overview

Interscript

The Interscript dataset contains interactive user feedback on a T5-11B model generated scripts.

Dataset

data.json contains the data in an easy to read JSON format. data.jsonl contains the data in a JSONL format. The file contains 8466 samples, one sample per line. Every sample is a JSON object with the following fields:

 {
        "input_script": "push chair in -> pull chair in; pull chair in -> push chair against wall; push chair against wall -> straighten chair legs; straighten chair legs -> Push all chairs in; line up the chairs -> push chair in",
        "input_feedback": "One would not pull chair in if they had initially pushed it in.",
        "output_script": "push chair against wall -> straighten chair legs;straighten chair legs -> Push all chairs in;line up the chairs -> push chair in;push chair in -> push chair against wall",
        "metadata": {
            "id": "301KG0KX9BKTC0HB7Z9SV1Y5HAFH2Y.2_implicit.gp",
            "goal": "push all chairs in",
            "is_distractor": false,
            "feedback_type": "implicit.gp",
            "edit": "Remove node 'pull chair in'",
            "input_script_formatted": [
                "1. line up the chairs",
                "2. push chair in",
                "3. pull chair in",
                "4. push chair against wall",
                "5. straighten chair legs",
                "6. Push all chairs in"
            ],
            "output_script_formatted": [
                "1. line up the chairs",
                "2. push chair in",
                "3. push chair against wall",
                "4. straighten chair legs",
                "5. Push all chairs in"
            ]
        }
    }

The description of the fields is as follows:

input_script: Model generated script $y_{bad}$.
input_feedback: User feedback on the input script $f$.
output_script: Fixed output script $y_{good}$.

Metadata contains additional information about the sample. Some important fields are:

id: Unique identifier of the sample.
goal: Goal of the script.
is_distractor: Whether the feedback is a distractor (please see Section 4 for more details).
feedback_type: Type of feedback (please see Section 4 "Annotation" for more details).
edit: The input_feedback presented as an edit operation on the input script, that is, the edit operation that transforms the input script into the output script.
input_script_formatted: The input script presented as a list of sentences.
output_script_formatted: The output script presented as a list of sentences.

Data collection process

We use Amazon Mechanical Turk to collect feedback on erroneous scripts from users.
An overview of the process is captured in the following figure:

Amazon Mechanical Turk Template

turk_template.html contains the template for Amazon Mechanical Turk HITs.

The InterScript dataset contains interactive user feedback on scripts generated by a T5-XXL model.

Related tags

Overview

Interscript

Dataset

Data collection process

Amazon Mechanical Turk Template

Owner

AI2

The open-source and free to use Python package miseval was developed to establish a standardized medical image segmentation evaluation procedure

SkipGNN: Predicting Molecular Interactions with Skip-Graph Networks (Scientific Reports)

Code for ICE-BeeM paper - NeurIPS 2020

modelvshuman is a Python library to benchmark the gap between human and machine vision

Pytorch implementation of various High Dynamic Range (HDR) Imaging algorithms

Rank 1st in the public leaderboard of ScanRefer (2021-03-18)

Official repo for BMVC2021 paper ASFormer: Transformer for Action Segmentation

DexterRedTool - Dexter's Red Team Tool that creates cronjob/task scheduler to consistently creates users

This repository contains the code used in the paper "Prompt-Based Multi-Modal Image Segmentation".

PolyGlot, a fuzzing framework for language processors

Official Repository for "Robust On-Policy Data Collection for Data Efficient Policy Evaluation" (NeurIPS 2021 Workshop on OfflineRL).

Code for Understanding Pooling in Graph Neural Networks

GPT-Code-Clippy (GPT-CC) is an open source version of GitHub Copilot

KGDet: Keypoint-Guided Fashion Detection (AAAI 2021)

Code for "The Box Size Confidence Bias Harms Your Object Detector"

这是一个yolox-pytorch的源码，可以用于训练自己的模型。

CCP dataset from Clothing Co-Parsing by Joint Image Segmentation and Labeling

Image-to-image regression with uncertainty quantification in PyTorch

Contains a bunch of different python programm tasks

The official PyTorch code for NeurIPS 2021 ML4AD Paper, "Does Thermal data make the detection systems more reliable?"