This repository provides a set functions to extract paragraphs from AWS Textract responses.

Overview

extract-paragraphs-with-aws-textract

Since AWS Textract (the AWS OCR service) does not have a native function to extract paragraphs, this repository provides a set of Python 3.X functions built on top of the AWS Python SDK (boto3) to extract paragraphs from AWS Textract responses.

PLEASE NOTE THAT:

  1. It is assumed that your client has the neccesary IAM permissions to access the different AWS resources required.
  2. Since AWS Textract analyze PDF files by running asynchronous operations, the current version assumes that you've already created an s3 bucket and that the PDF files are already stored there. If not, please go to the boto3 docs to know how to create a bucket as well as upload files.
  3. The paragraph_constructor is an ad hoc function for my use case. You may have to adapt it based on the space between lines in your data.

UPCOMING FEATURES:

  • Address abstract cases with the paragrpah_constructor function.
  • Export data in different formats.
  • AWS CloudFormation template for a serverless architecture to execute the functions when a new object is uploaded in your S3 bucket.

Please feel free to suggest new features or improvements to the current code. <3

Owner
Juan Anzola
Juan Anzola
Roblox-Account-Gen - A simple account generator not using paid solving services

Roblox Account Generator Star this if it helped to spread awareness! No 2captcha

x 1 Feb 17, 2022
Simple Bot With Python 3.8+ For Converstaion Your Media

Media-Conversation Simple Bot With Python 3.8+ For Converstaion Your Media

Farzin 2 Dec 06, 2021
DonLee Robot

🤖 𝐃𝐎𝐍 𝐋𝐄𝐄 𝐑𝐎𝐁𝐎𝐓 𝐕𝟐 🤖 👋 Hey Muhammed, Iam DonLee RoBoT Make me an admin for your group and channel then connect me.... 🎉 🙂 To build a

Muhammed 27 Dec 01, 2022
UniHub API is my solution to bringing students and their universities closer

🎓 UniHub API UniHub API is my solution to bringing students and their universities closer... By joining UniHub, students will be able to join their r

Abdelbaki Boukerche 5 Nov 21, 2021
Erhalten Sie wichtige Warnmeldungen des Bevölkerungsschutzes für Gefahrenlagen wie zum Beispiel Gefahrstoffausbreitung oder Unwetter per Programmierschnittstelle.

nina-api Erhalten Sie wichtige Warnmeldungen des Bevölkerungsschutzes für Gefahrenlagen wie zum Beispiel Gefahrstoffausbreitung oder Unwetter per Prog

Bundesstelle für Open Data 68 Dec 19, 2022
Feedback-TelegramBot is a resemblance bot which can be deployed on server

Feedback-TelegramBot Feedback-TelegramBot is a resemblance bot which can be deployed on server This work is based on Telegram library, thanks to their

2 Jan 03, 2022
GitNews: Github webhooks for Telegram

GitNews - Github webhooks for Telegram Setup: server: clone repo git clone https

Druv Jagdish 1 Feb 14, 2022
A simple library for interacting with Amazon SQS.

qoo is a very simple Amazon SQS client, written in Python. It aims to be much more straight-forward to use than boto3, and specializes only in Amazon

Jacobi Petrucciani 2 Oct 30, 2020
The official source code for Ghost Discord selfbot.

👻 Ghost Selfbot The official code for Ghost which was recently discontinued and released to the public. Feel free to use any of the code found in thi

Ghost 121 Nov 09, 2022
Solves bombcrypto newest captcha

Solves Bombcrypto newest captcha A very compact implementation using just cv2 and ctypes, ready to be deployed to your own project. How does it work I

19 May 06, 2022
A simple python script for rclone. Use multiple Google Service Accounts and cycle through them.

About GSAclone GSAclone is a simple python script for rclone, written with the purpose of using multiple Google service accounts on Google Drive and "

Shiro39 6 Feb 25, 2022
Cdk-python-crud-app - CDK Python CRUD App

Welcome to your CDK Python project! You should explore the contents of this proj

Shapon Sheikh 1 Jan 12, 2022
A fun hangman style game to guess random movie names with a short summary about the movie.

hang-movie-man Hangman but for movies 😉 This is a fun hangman style game to guess random movie names from the local database and show some summary ab

Ankit Josh 10 Sep 07, 2022
PaddleOCR推理的pytorch实现和模型转换

PaddleOCR2Pytorch 简介 ”真·白嫖“PaddleOCR 注意 PytorchOCR由PaddleOCR-2.0rc1+动态图版本移植。 特性 高质量推理模型,准确的识别效果 超轻量ptocr_mobile移动端系列 通用ptocr_server系列 支持中英文数字组合识别、竖排文本

519 Jan 08, 2023
Scrape Twitter for Tweets

Backers Thank you to all our backers! 🙏 [Become a backer] Sponsors Support this project by becoming a sponsor. Your logo will show up here with a lin

Ahmet Taspinar 2.2k Jan 02, 2023
Neubot client

Neubot, the network neutrality bot Neubot is a research project on network neutrality of the Nexa Center for Internet & Society at Politecnico di Tori

Neubot 57 Nov 02, 2021
Python SDK for Thepeer

Python SDK for Thepeer

Oluwafemi Tairu 2 Dec 22, 2021
DeleteAllBot - Telegram bot to delete all messages in a group

Delete All Bot A star ⭐ from you means a lot to me ! Telegram bot to delete all

Stark Bots 15 Dec 26, 2022
Instagram-Reports is a tool made to ban any scam or bad person

ABOUT TOOL : Instagram-Reports is a tool made to ban any scam or bad person. Installation : sudo apt-get update -y sudo apt-get upgrade -y apt insta

Evan Al Mahmud Irfan ථ 1 Dec 20, 2021
A small and fun Discord Bot that is written in Python and discord-interactions (with discord.py)

Articuno (discord-interactions) A small and fun Discord Bot that is written in Python and discord-interactions (with discord.py) Get started If you wa

Blue 8 Dec 26, 2022