Markup is an online annotation tool that can be used to transform unstructured documents into structured formats for NLP and ML tasks, such as named-entity recognition. Markup learns as you annotate in order to predict and suggest complex annotations. Markup also provides integrated access to existing and custom ontologies, enabling the prediction and suggestion of ontology mappings based on the text you're annotating.

Last update: Dec 18, 2022

Overview

What is Markup?

Usage

A full-feature version of Markup is available both via website and local installation.

Online

The online version of Markup can be found here.

Local Server

Docker

Run docker run -d -p 8000:8000 samueldobbie/markup and visit http://localhost:8000.

Manual Installation

Clone or download the repository.
Run python setup.py using 64-bit Python3.
Visit http://localhost:8000.

For futher sessions, the local server can be started directly by running python manage.py runserver localhost:8000.

Documentation

Documentation to help with setting up and using Markup can be found here.

Features

Ability to navigate between and annotate multiple documents in a single session.
Predictive annotation suggestions (incl. attributes) using underlying active learning and sequence-to-sequence models.
Integrated access to pre-loaded and user-defined ontologies, enabling predictive mappings and direct querying.
Built-in configuration file creator.
Built-in synthetic data generator and custom model trainer (local version only due to high computational expense).
Dynamic attribute display.
Any number of overlaying annotations, enabling the capture of complex data.
Full-feature tool available via local installation and website.
Dark mode.

Future Plans

Add user accounts.
Add ability for users to join a team and share ontologies, documents, guidelines, annotations, etc.
Accessible version for colour-blind users.
Add ability to perform text and image classification.
Add ability to annotate images.

Known Bugs / Issues

Annotations may be offset when annotating across newlines in CRLF (Windows) text documents. The offset is purely visual; the exported indicies will be correct.
When using the website version of Markup, certain features may freeze while annotations are being predicted.

Related tags

Overview

What is Markup?

Usage

Online

Local Server

Docker

Manual Installation

Documentation

Features

Future Plans

Known Bugs / Issues

Owner

Samuel Dobbie

BaseCrack is a tool written in Python that can decode all alphanumeric base encoding schemes.

Maiden & Spell community player ranking based on tournament data.

A simple Python module for parsing human names into their individual components

Goblin-sim - Procedural fantasy world generator

Python Q&A for Network Engineers

REST API for sentence tokenization and embedding using Multilingual Universal Sentence Encoder.

A program that looks through entered text and replaces certain commands with mathematical symbols

a python package that lets you add custom colors and text formatting to your scripts in a very easy way!

Code Jam for creating a text-based adventure game engine and custom worlds

PyNews 📰 Simple newsletter made with python 🐍🗞️

Paranoid text spacing in Python

This is a text summarizing tool written in Python

"Complexity" of Flags of the countries of the world

汉字转拼音(pypinyin)

StealBit1.1 and earlier strings and config extraction scripts

Phone Number formatting for PlaySMS Platform - BulkSMS Platform

Repository containing the code for An-Gocair text normaliser

Convert ebooks with few clicks on Telegram!

The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity

🚩 A simple and clean python banner generator - Banners