Repositorio com arquivos processados da CPI da COVID para facilitar analise

Related tags

Miscellaneouscpi4all
Overview

cpi4all

Repositorio com arquivos processados da CPI da COVID para facilitar analise

Organização

No site do senado é possivel encontrar a lista de todos os documentos coletados pela CPI da COVID.

A tabela no site possui a seguinte estrutura:

No Arquivos Data de recebimento Remetente Origem Descrição Caixa Em Resposta
1 Link1 ... ... ... ... ... ...
2 Link2/link3 ... ... ... ... ... ...

Esses links levam ao download de arquivos PDF com os documentos em questão.

Nesse repositorio você podera encontrar a versão txt desses arquivos. O nome do arquivo nesse repositorio é formado por <No do documento>_<numero do link>. Por exemplo:

link1 = 1_1 porque ele é relativo ao arquivo No 1, e é o primeiro link.

link2 = 2_1 porque ele é relativo ao arquivo No 2, e é o primeiro link dessa linha.

link3 = 2_2 porque ele é relativo ao arquivo No 2, e é o segundo link da linha.

A versão texto de todos os documentos está na pasta database/txts/.

Exemplos:

Arquivo No 1, primeiro link: 1_1

Arquivo No 4, quarto link: 3_4

Nota 1: Nem todos os arquivos foram convertidos ainda

Nota 2: A conversão usa reconhecimento de imagem e pode ficar bem ruim as vezes, gerando erros ortograficos ou palavras sem nexo algum.

Para desenvolvedores

Os scripts funcionam na seguinte sequencia:

  1. extract_rows.py: Vai no site do senado e extrai as informações de cada linha da tabela. Todos os dados são salvos em database/rows.
  2. extract_headers.py: Para cada link em cada linha, esse script pega metadados do arquivo (tamanho, tipo) que vão ser uteis depois. Esses dados são salvos em database/headers.
  3. download_pdfs.py: Baixa todos os PDFs descritos em database/headers e salva em database/pdfs.
  4. convert_pdf_to_jpg.py: Converte todos os PDFs em database/pdfs para imagens em database/jpgs.
  5. convert_jpg_to_txt.py: Converte todos as imagens em database/jpgs para texto em database/txt.

Por motivos de performance, apenas as pastas database/rows, database/headers e database/txts sao salvas nesse repositorio.

TODO: 0. Melhorar esse readme :)

  1. Usar o githubpages para gerar um site estatico que permite pesquisar em todos os txt
  2. Terminar de converter todos os arquivos
  3. Investigar arquivos em que a conversão ficou pessima.
  4. Fazer extração automatica de datas e prover um json com a ordem cronologica dos arquivos.
Owner
Breno Rodrigues Guimarães
Breno Rodrigues Guimarães
In this project , I play with the YouTube data API and extract trending videos in Nigeria on a particular day

YouTubeTrendingVideosAnalysis In this project , I played with the YouTube data API and extracted trending videos in Nigeria on a particular day. This

1 Jan 11, 2022
banking system with python, beginner friendly, preadvanced level

banking-system-python banking system with python, beginner friendly, preadvanced level Used topics Functions else/if/elif dicts methods parameters hol

Razi Falah 1 Feb 03, 2022
Rates how pog a word or user is. Not random and does have *some* kind of algorithm to it.

PogRater :D Rates how pogchamp a word is :D A fun project coded by JBYT27 using Python3 Have you ever wondered how pog a word is? Well, congrats, you

an aspirin 2 Jun 25, 2022
RangDev Notepad App With Python

RangDev Notepad-App-With-Python Take down quick and speedy notes! This is a small project of a notepad app built with Tkinter and SQLite3. Database cr

rangga.alrasya 1 Dec 01, 2021
A Python Web Application for Checking vaccine slots by pincodes and auto slot booking.

The Dashboard is developed using Bokeh and python 3.5+. This dashboard is useful for you if you are looking for something which will help you to book the vaccine slot once slots become available. Oth

Suraj Deshmukh 10 Jan 23, 2022
System Information Utility With Python

System-Information-Utility This is a simple utility, for the terminal, which allows you to find out information about your PC. It's very easy to run t

2 Apr 15, 2022
Backend Interview Challenge

Inspect HOA backend challenge This is a simple flask repository with some endpoints and requires a few more endpoints. It follows a simple MVP (model-

1 Jan 20, 2022
🤡 Multiple Discord selfbot src deobfuscated !

Deobfuscated selfbot sources About. If you whant to add src, please make pull requests. If you whant to deobfuscate src, send mail to

Sreecharan 5 Sep 13, 2021
samples of neat code

NEAT-samples Some samples of code and config files for use with the NEAT-Python package These samples are largely copy and pasted, so if you

Harrison 50 Sep 28, 2022
A command-line utility that creates projects from cookiecutters (project templates), e.g. Python package projects, VueJS projects.

Cookiecutter A command-line utility that creates projects from cookiecutters (project templates), e.g. creating a Python package project from a Python

18.6k Jan 02, 2023
Code and yara rules to detect and analyze Cobalt Strike

Cobalt Strike Resources This repository contains: analyze.py: a script to analyze a Cobalt Strike beacon (python analyze.py BEACON) extract.py; extrac

Tek 224 Jan 04, 2023
SiliconCompiler is an open source compiler framework that automates translation from source code to silicon.

SiliconCompiler is an open source compiler framework that aims to automate translation from source code to silicon.

siliconcompiler 539 Jan 04, 2023
E5自动续期

AutoApi v6.3 (2021-2-18) ———— E5自动续期 AutoApi系列: AutoApi(v1.0) 、 AutoApiSecret(v2.0) 、 AutoApiSR(v3.0) 、 AutoApiS(v4.0) 、 AutoApiP(v5.0) 说明 E5自动续期程序,但是

34 Feb 20, 2021
Originally used during Marketplace.tf's open period, this program was used to get the profit of items bought with keys and sold for dollars.

Originally used during Marketplace.tf's open period, this program was used to get the profit of items bought with keys and sold for dollars. Practically useless for me now, but can be used as an exam

BoggoTV 1 Dec 11, 2021
Nook is a simple, concatenative programming language written in Python.

Nook Nook is a simple, concatenative programming language written in Python. Status Nook is currently WIP. It lacks a lot of basic feature, and will n

Wumi4 4 Jul 20, 2022
Create an application to visualize single/multiple Xandar Kardian people counting sensors detection result for a indoor area.

Program Design Purpose: We want to create an application to visualize single/multiple Xandar Kardian people counting sensors detection result for a indoor area.

2 Dec 28, 2022
This is a modified variation of abhiTronix's vidgear. In this variation, it is possible to write the output file anywhere regardless the permissions.

Info In order to download this package: Windows 10: Press Windows+S, Type PowerShell (cmd in older versions) and hit enter, Type pip install vidgear_n

Ege Akman 3 Jan 30, 2022
A tool to improve Boolean satisfiability (SAT) solver user's life

SatHelper This is a tool to improve the Boolean satisfiability (SAT) and MaxSAT solver user's life. It helps you model various problems as SAT and Max

Tomas Balyo 1 Nov 16, 2021
A type based dependency injection framework for Python 3.9+

Alluka A type based dependency injection framework for Python 3.9+. Installation You can install Alluka from PyPI using the following command in any P

Lucina 16 Dec 15, 2022
Async Python Circuit Breaker implementation

aiocircuitbreaker This is an async Python implementation of the circuitbreaker library. Installation The project is available on PyPI. Simply run: $ p

5 Sep 05, 2022