Automate the case review on legal case documents and find the most critical cases using network analysis

Overview

Automation on Legal Court Cases Review

This project is to automate the case review on legal case documents and find the most critical cases using network analysis.

Short write-up

Affiliation: Institute for Social and Economic Research and Policy, Columbia University

Project Information:

Keywords: Automation, PDF parse, String Extraction, Network Analysis

Software:

  • Python : pdfminer, LexNLP, nltk sklearn
  • R: igraph

Scope:

  1. Parse court documents, extract citations from raw text.
  2. Build citation network, identify important cases in the network.
  3. Extract judge's opinion text and meta information including opinion author, court, decision.
  4. Model training to predict court decision based on opinion text.

Polit Study on 159 Legal Court Documents (in pilot_159 folder)

1. Process PDF documents using Python

Ipython Notebook Description
1.Extraction by LexNLP.ipynb Extract meta inforation use LexNLP package.
2.Layer Analysis on Sigle File. ipynb Use pdfminer to extract the raw text and the paragraph segamentation in the PDF document.
3.Patent Position by Layer.ipynb Identify the position of patent number in extracted layers from PDF.
4.Opinion and Author by Layer.ipynb Extract opinion text, author, decisions from the layers list.
5.Wrap up to Meta Data.ipynb Store extracted meta data to .json or .csv
6.Visualize citation frequency.ipynb Bar plot of the citation frequencies

2. Data: Parse PDF documents via Python

These datasets are NOT included in this public repository for intellectual property and privacy concern

File
pdf2text159.json A dictionary of 3 list: file_name, raw_text, layers.
cite_edge159.csv Edge list of citation network
cite_node159.csv Meta information of each case: case_number, court, dates
reference_extract.csv cited cases in a list for every case, untidy format for analysis
citation159.csv file citation pair, tidy format for calculation
regulation159.csv file regulation pair, tidy format for calculation

3. Analyze and Visualize using R

File
Calculate Citation Frequency.Rmd Analyze reference_extract.csv
Citation Network.Rmd Analyze cite_edge159

4. Visulization Chart Sample

Citation Frequencycase_freq

Citation Networkcitation_net

Network Visulization and Predictive Modeling on 854 Legal Court Cases (in Extraction_Modelling folder)

1. Extract opinion and meta information from raw text data

.ipynb notebook Description
Full Dataset Merge.ipynb Merge the 854 cases dataset
Edge and Node List.ipynb Create edge and node list
Full Extractions.ipynb Extract author, judge panel, opinion text
Clean Opinion Text.ipynb Remove references and special characters in opinion text

2. Datasets

These datasets are NOT included in this public repository for intellectual property and privacy concern

Dataset Description
amy_cases.json large dictionary {file name: raw text} for 854 cases, from Lilian's PDF parsing
full_name_text.json convert amy_cases.json key value pair to two list: file_name, raw_text
cite_edge.csv edge list of citation
cite_node.csv node list contains case_code, case_name, court_from, court_type
extraction854.csv full extractions include case_code, case_name, court_from, court_type, result, author, judge_panel
decision_text.json json file include author, decision(result of the case), opinion (opinion text), cleaned_text (cleaned opinion text)
cleaned_text.csv csv file contains allt the cleaned text
predict_data.csv cleaned dataset for NLP modeling predict court decision

3. Visulization using R

R markdown file
Full Network Graph.Rmd draw the full citation network
Citation Betwwen Nodes.Rmd draw citation between all the available cases
Clean Data For Predictive Modelling.rmd clean text data for predictive modeling

Interactive Graph

Play with Interactive Graph

Full Citation Network (all cases and cited cases)

Citation Between Available Cases

4. Predictive Modeling using Python

ipynb notebook
NLP Predictive Modeling.ipynb Try different preprocessing, and build a logistic regression to predict court decision.

Visulization of the Bi-gram (words) with the strongest coefficient

Bigram

Owner
Yi Yin
Tech & Business Alignment @ Wolfram Research, Social Sciences Research @ Columbia University
Yi Yin
Plot toolbox based on Matplotlib, simple and elegant.

Elegant-Plot Plot toolbox based on Matplotlib, simple and elegant. 绘制效果 绘制过程 数据准备 每种图标类型的目录下有data.csv文件,依据样例数据填入自己的数据。

3 Jul 15, 2022
RockNext is an Open Source extending ERPNext built on top of Frappe bringing enterprise ready utilization.

RockNext is an Open Source extending ERPNext built on top of Frappe bringing enterprise ready utilization.

Matheus Breguêz 13 Oct 12, 2022
a simple REPL display lib for circuitpython

Circuitpython-termio-lib a simple REPL display lib for circuitpython Fonctions cls clear terminal screen and set cursor on top left : coords 0,0 usage

BeBoXoS 1 Nov 17, 2021
Automatically Visualize any dataset, any size with a single line of code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.

AutoViz Automatically Visualize any dataset, any size with a single line of code. AutoViz performs automatic visualization of any dataset with one lin

AutoViz and Auto_ViML 1k Jan 02, 2023
Epagneul is a tool to visualize and investigate windows event logs

epagneul Epagneul is a tool to visualize and investigate windows event logs. Dep

jurelou 190 Dec 13, 2022
Pretty Confusion Matrix

Pretty Confusion Matrix Why pretty confusion matrix? We can make confusion matrix by using matplotlib. However it is not so pretty. I want to make con

Junseo Ko 5 Nov 22, 2022
A workshop on data visualization in Python with notebooks and exercises for following along.

Beyond the Basics: Data Visualization in Python The human brain excels at finding patterns in visual representations, which is why data visualizations

Stefanie Molin 162 Dec 05, 2022
Geocoding library for Python.

geopy geopy is a Python client for several popular geocoding web services. geopy makes it easy for Python developers to locate the coordinates of addr

geopy 3.8k Jan 02, 2023
Tweets your monthly GitHub Contributions as Wordle grid

Tweets your monthly GitHub Contributions as Wordle grid

Venu Vardhan Reddy Tekula 5 Feb 16, 2022
Farhad Davaripour, Ph.D. 1 Jan 05, 2022
An XLSX spreadsheet renderer for Django REST Framework.

drf-renderer-xlsx provides an XLSX renderer for Django REST Framework. It uses OpenPyXL to create the spreadsheet and returns the data.

The Wharton School 166 Dec 01, 2022
Bokeh Plotting Backend for Pandas and GeoPandas

Pandas-Bokeh provides a Bokeh plotting backend for Pandas, GeoPandas and Pyspark DataFrames, similar to the already existing Visualization feature of

Patrik Hlobil 822 Jan 07, 2023
Drag’n’drop Pivot Tables and Charts for Jupyter/IPython Notebook, care of PivotTable.js

pivottablejs: the Python module Drag’n’drop Pivot Tables and Charts for Jupyter/IPython Notebook, care of PivotTable.js Installation pip install pivot

Nicolas Kruchten 512 Dec 26, 2022
Jupyter notebook and datasets from the pandas Q&A video series

Python pandas Q&A video series Read about the series, and view all of the videos on one page: Easier data analysis in Python with pandas. Jupyter Note

Kevin Markham 2k Jan 05, 2023
Some examples with MatPlotLib library in Python

MatPlotLib Example Some examples with MatPlotLib library in Python Point: Run files only in project's directory About me Full name: Matin Ardestani Ag

Matin Ardestani 4 Mar 29, 2022
High-level geospatial data visualization library for Python.

geoplot: geospatial data visualization geoplot is a high-level Python geospatial plotting library. It's an extension to cartopy and matplotlib which m

Aleksey Bilogur 1k Jan 01, 2023
A tool for creating Toontown-style nametags in Panda3D

Toontown-Nametag Toontown-Nametag is a tool for creating Toontown Online/Toontown Rewritten-style nametags in Panda3D. It contains a function, createN

BoggoTV 2 Dec 23, 2021
A python wrapper for creating and viewing effects for Matt Parker's christmas tree.

Christmas Tree Visualizer A python wrapper for creating and viewing effects for Matt Parker's christmas tree. Displays py or csv effect files and allo

4 Nov 22, 2022
Apache Superset is a Data Visualization and Data Exploration Platform

Apache Superset is a Data Visualization and Data Exploration Platform

The Apache Software Foundation 49.9k Jan 02, 2023
Using SQLite within Python to create database and analyze Starcraft 2 units data (Pandas also used)

SQLite python Starcraft 2 English This project shows the usage of SQLite with python. To create, modify and communicate with the SQLite database from

1 Dec 30, 2021