Contract Understanding Atticus Dataset

This repository contains code for the Contract Understanding Atticus Dataset (CUAD), a dataset for legal contract review curated by the Atticus Project. It is part of the associated paper CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review by Dan Hendrycks, Collin Burns, Anya Chen, and Spencer Ball.

Contract review is a task about "finding needles in a haystack." We find that Transformer models have nascent performance on CUAD, but that this performance is strongly influenced by model design and training dataset size. Despite some promising results, there is still substantial room for improvement. As one of the only large, specialized NLP benchmarks annotated by experts, CUAD can serve as a challenging research benchmark for the broader NLP community.

For more details about CUAD and legal contract review, see the Atticus Project website.

Trained Models

We provide checkpoints for three of the best models fine-tuned on CUAD: RoBERTa-base (~100M parameters), RoBERTa-large (~300M parameters), and DeBERTa-xlarge (~900M parameters).

Requirements

This repository requires the HuggingFace Transformers library. It was tested with Python 3.8, PyTorch 1.7, and Transformers 4.3/4.4.

Citation

If you find this useful in your research, please consider citing:

@article{hendrycks2021cuad,
      title={CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review}, 
      author={Dan Hendrycks and Collin Burns and Anya Chen and Spencer Ball},
      journal={arXiv preprint arXiv:2103.06268},
      year={2021}
}

CUAD

Related tags

Overview

Contract Understanding Atticus Dataset

Trained Models

Requirements

Citation

Owner

The Atticus Project

Vpw analyzer - A visual J1850 VPW analyzer written in Python

Implementation of "Meta-rPPG: Remote Heart Rate Estimation Using a Transductive Meta-Learner"

Deep Learning agent of Starcraft2, similar to AlphaStar of DeepMind except size of network.

Multi-Modal Machine Learning toolkit based on PyTorch.

A multi-mode modulator for multi-domain few-shot classification (ICCV)

Demo for Real-time RGBD-based Extended Body Pose Estimation paper

Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding

Google-drive-to-sqlite - Create a SQLite database containing metadata from Google Drive

Tensorflow implementation of our method: "Triangle Graph Interest Network for Click-through Rate Prediction".

Contrastive Learning Inverts the Data Generating Process

Links to works on deep learning algorithms for physics problems, TUM-I15 and beyond

Official PyTorch implementation of "Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-supervised Action Recognition" in AAAI2022.

Generating synthetic mobility data for a realistic population with RNNs to improve utility and privacy

Rlmm blender toolkit - A set of tools to streamline level generation in UDK straight from Blender

Tensorflow AffordanceNet and AffContext implementations

[CVPRW 2021] Code for Region-Adaptive Deformable Network for Image Quality Assessment

[SIGGRAPH Asia 2019] Artistic Glyph Image Synthesis via One-Stage Few-Shot Learning

BERT model training impelmentation using 1024 A100 GPUs for MLPerf Training v1.1

Capstone-Project-2 - A game program written in the Python language

Models, datasets and tools for Facial keypoints detection