This repository provides a set functions to extract paragraphs from AWS Textract responses.

Overview

extract-paragraphs-with-aws-textract

Since AWS Textract (the AWS OCR service) does not have a native function to extract paragraphs, this repository provides a set of Python 3.X functions built on top of the AWS Python SDK (boto3) to extract paragraphs from AWS Textract responses.

PLEASE NOTE THAT:

  1. It is assumed that your client has the neccesary IAM permissions to access the different AWS resources required.
  2. Since AWS Textract analyze PDF files by running asynchronous operations, the current version assumes that you've already created an s3 bucket and that the PDF files are already stored there. If not, please go to the boto3 docs to know how to create a bucket as well as upload files.
  3. The paragraph_constructor is an ad hoc function for my use case. You may have to adapt it based on the space between lines in your data.

UPCOMING FEATURES:

  • Address abstract cases with the paragrpah_constructor function.
  • Export data in different formats.
  • AWS CloudFormation template for a serverless architecture to execute the functions when a new object is uploaded in your S3 bucket.

Please feel free to suggest new features or improvements to the current code. <3

Owner
Juan Anzola
Juan Anzola
DSAIL repos - DSAIL Repository Template

DSAIL Repository Template DSAIL @ KAIST . ├── configs ('--F', help='for configur

yunhak 2 Feb 14, 2022
Project for the discipline of Visual Data Analysis at EMAp FGV.

Analysis of the dissemination of fake news about COVID-19 on Twitter This project was the final work for the discipline of Visual Data Analysis of the

Giovani Valdrighi 2 Jan 17, 2022
Discord Mafia Game Bot using nextcord

Mafia-Bot Discord Mafia Game Bot using nextcord Features Mafia Game Game Replays Installation Run the following command to install required modules: p

Nian 6 Nov 19, 2022
A visualization of people a user follows on Twitter

Twitter-Map This software allows the user to create maps of Twitter accounts. Installation git clone Oliver Greenwood 12 Jul 20, 2022

MusicBot is the original Discord music bot written for Python 3.5+, using the discord.py library

The original MusicBot for Discord (formerly SexualRhinoceros/MusicBot)

Just Some Bots 2.9k Jan 02, 2023
Asynchronous Python Wrapper for the GoFile API

Asynchronous Python Wrapper for the GoFile API

Gautam Kumar 22 Aug 04, 2022
Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work

Modern Data Lake Storage Layers This repository contains supporting assets for my research in modern Data Lake storage layers like Apache Hudi, Apache

Damon P. Cortesi 25 Oct 31, 2022
Bomber-X - A SMS Bomber made with Python

Bomber-X A SMS Bomber made with Python Linux/Termux apt update apt upgrade apt i

S M Shahriar Zarir 2 Mar 10, 2022
A Serverless Application Model stack that persists the $XRP price to the XRPL every minute as a TrustLine. There are no servers, it is effectively a "smart contract" in Python for the XRPL.

xrpl-price-persist-oracle-sam This is a XRPL Oracle that publishes external data into the XRPL. This Oracle was inspired by XRPL-Labs/XRPL-Persist-Pri

Joseph Chiocchi 11 Dec 17, 2022
Automate coin farming for dankmemer. Unlimited accounts at once. Uses a proxy

dankmemer-farm Simple script to farm Dankmemer coins with multiple accounts at once. Requires: Proxies, Discord Tokens Disclaimer I don't take respons

Scobra 12 Dec 20, 2022
ChairBot is designed to be reliable, easy to use, and lightweight for every user, and easliy to code add-ons for ChairBot.

ChairBot is designed to be reliable, easy to use, and lightweight for every user, and easliy to code add-ons for ChairBot. Ready to see whats possible with ChairBot?

1 Nov 08, 2021
UniHub API is my solution to bringing students and their universities closer

🎓 UniHub API UniHub API is my solution to bringing students and their universities closer... By joining UniHub, students will be able to join their r

Abdelbaki Boukerche 5 Nov 21, 2021
Projeto com o objetivo de aprender o funcionamento de Consumo de APIs.

Consumindo API SuperHero Projeto com o objetivo de aprender o funcionamento de Consumo de APIs.

Deivisson Henrique 1 Dec 30, 2021
Facebook fishing on telegram bot

Facebook-fishing Facebook fishing on telegram bot تثبيت الاداة pkg update -y pkg upgrade -y pkg install git -y pkg install python -y git clone https:/

sadamalsharabi 7 Oct 18, 2022
GroupMenter : New Telegram Group Manager Bot🔸Fast 🔸Python🔸Pyrogram 🔸

GroupMenter An PowerFull Group Manager Bot. Written In Pytelethon. Info • A modular Telegram Python bot running on python3. • Can be found on telegram

Group Menter 24 Jun 28, 2022
Pretend to be a discord bot

Pretendabot © Pretend to be a discord bot! About Pretendabot© is an app that lets you become a discord bot!. It uses discord intrigrations(webhooks) a

Advik 3 Apr 24, 2022
My beancount practice as a template

my-beancount-template 个人 Beancount 方案的模板仓库 相关博客 复式记账指北(一):What and Why? 复式记账指北(二):做账方法论 复式记账指北(三):如何打造不半途而废的记账方案 配置 详细配置请参考博客三。必须修改的配置有: Bot功能:data/be

KAAAsS 29 Nov 29, 2022
This app is providing you to track some online products' prices via GMAIL.

Price Tracking App variables and descriptions of that code is in Turkish language. but we're working on translate them into English. This app is provi

Abdullah Aslan 1 Dec 11, 2021
An advanced QR Code telegram bot with more features.

QR Code Bot A telegram qr code encode and decode bot Advanced Features 1. Database ( MongoDB ) Support 2. Broadcast Support 3. Status Command 4. Setti

Fayas Noushad 16 Nov 12, 2022
A minimal caching proxy to GitHub's REST & GraphQL APIs

github-proxy A caching forward proxy to GitHub's REST and GraphQL APIs. GitHub-Proxy is a thin, highly extensible, highly configurable python framewor

Babylon Health 26 Oct 05, 2022