Script em python para carregar os arquivos de cnpj dos dados públicos da Receita Federal em MYSQL.

Overview

cnpj-mysql

Script em python para carregar os arquivos de cnpj dos dados públicos da Receita Federal em MYSQL.

Dados públicos de cnpj no site da Receita:

A partir de 2021 os dados da Receita Federal estão disponíveis no link https://www.gov.br/receitafederal/pt-br/assuntos/orientacao-tributaria/cadastros/consultas/dados-publicos-cnpj (ou http://200.152.38.155/CNPJ/) em formato csv zipado.

Pré-requisitos:

Python 3.8;
Bibliotecas pandas, dask, sqlalchemy e pymysql.

Utilizando o script:

Baixe todos os arquivos zipados do site da Receita e salve na pasta "dados-publicos-zip".
O download no site da Receita é lento, pode demorar várias horas. Sugiro utilizar um gerenciador de downloads.

Crie uma pasta com o nome "dados-publicos".

No servidor MYSQL, crie um database, por exemplo, cnpj.
Especifique os parâmetros no começo do script:
dbname = 'cnpj'
username = 'root'
password = ''
host = '127.0.0.1'

Para iniciar esse script, em um console DOS digite
python dados_cnpj_para_mysql.py

A execução durou cerca de 5hs em um notebook i7 de 8a geração.

Outras referências:

Para trabalhar com os dados de cnpj no formato SQLITE, use o meu projeto (https://github.com/rictom/cnpj-sqlite).
O projeto (https://github.com/rictom/rede-cnpj) utiliza os dados públicos de CNPJ para visualização de relacionamentos entre empresas e sócios.

Histórico de versões

versão 0.1 (novembro/2021)

  • primeira versão
A Python wheel containing PostgreSQL

postgresql-wheel A Python wheel for Linux containing a complete, self-contained, locally installable PostgreSQL database server. All servers run as th

Michel Pelletier 71 Nov 09, 2022
A wrapper around asyncpg for use with sqlalchemy

asyncpgsa A python library wrapper around asyncpg for use with sqlalchemy Backwards incompatibility notice Since this library is still in pre 1.0 worl

Canopy 404 Dec 03, 2022
A wrapper for SQLite and MySQL, Most of the queries wrapped into commands for ease.

Before you proceed, make sure you know Some real SQL, before looking at the code, otherwise you probably won't understand anything. Installation pip i

Refined 4 Jul 30, 2022
Python interface to Oracle Database conforming to the Python DB API 2.0 specification.

cx_Oracle version 8.2 (Development) cx_Oracle is a Python extension module that enables access to Oracle Database. It conforms to the Python database

Oracle 841 Dec 21, 2022
DataStax Python Driver for Apache Cassandra

DataStax Driver for Apache Cassandra A modern, feature-rich and highly-tunable Python client library for Apache Cassandra (2.1+) and DataStax Enterpri

DataStax 1.3k Dec 25, 2022
Pony Object Relational Mapper

Downloads Pony Object-Relational Mapper Pony is an advanced object-relational mapper. The most interesting feature of Pony is its ability to write que

3.1k Jan 04, 2023
An asyncio compatible Redis driver, written purely in Python. This is really just a pet-project for me.

asyncredis An asyncio compatible Redis driver. Just a pet-project. Information asyncredis is, like I've said above, just a pet-project for me. I reall

Vish M 1 Dec 25, 2021
The JavaScript Database, for Node.js, nw.js, electron and the browser

The JavaScript Database Embedded persistent or in memory database for Node.js, nw.js, Electron and browsers, 100% JavaScript, no binary dependency. AP

Louis Chatriot 13.2k Jan 02, 2023
a small, expressive orm -- supports postgresql, mysql and sqlite

peewee Peewee is a simple and small ORM. It has few (but expressive) concepts, making it easy to learn and intuitive to use. a small, expressive ORM p

Charles Leifer 9.7k Dec 30, 2022
A library for python made by me,to make the use of MySQL easier and more pythonic

my_ezql A library for python made by me,to make the use of MySQL easier and more pythonic This library was made by Tony Hasson , a 25 year old student

3 Nov 19, 2021
Py2neo is a comprehensive toolkit for working with Neo4j from within Python applications or from the command line.

Py2neo v3 Py2neo is a client library and toolkit for working with Neo4j from within Python applications and from the command line. The core library ha

64 Oct 14, 2022
Pysolr — Python Solr client

pysolr pysolr is a lightweight Python client for Apache Solr. It provides an interface that queries the server and returns results based on the query.

Haystack Search 626 Dec 01, 2022
TileDB-Py is a Python interface to the TileDB Storage Engine.

TileDB-Py TileDB-Py is a Python interface to the TileDB Storage Engine. Quick Links Installation Build Instructions TileDB Documentation Python API re

TileDB, Inc. 149 Nov 28, 2022
An extension package of 🤗 Datasets that provides support for executing arbitrary SQL queries on HF datasets

datasets_sql A 🤗 Datasets extension package that provides support for executing arbitrary SQL queries on HF datasets. It uses DuckDB as a SQL engine

Mario Šaško 19 Dec 15, 2022
A fast PostgreSQL Database Client Library for Python/asyncio.

asyncpg -- A fast PostgreSQL Database Client Library for Python/asyncio asyncpg is a database interface library designed specifically for PostgreSQL a

magicstack 5.8k Dec 31, 2022
Simple Python demo app that connects to an Oracle DB.

Cloud Foundry Sample Python Application Connecting to Oracle Simple Python demo app that connects to an Oracle DB. The app is based on the example pro

Daniel Buchko 1 Jan 10, 2022
A Relational Database Management System for a miniature version of Twitter written in MySQL with CLI in python.

Mini-Twitter-Database This was done as a database design course project at Amirkabir university of technology. This is a relational database managemen

Ali 12 Nov 23, 2022
aiosql - Simple SQL in Python

aiosql - Simple SQL in Python SQL is code. Write it, version control it, comment it, and run it using files. Writing your SQL code in Python programs

Will Vaughn 1.1k Jan 08, 2023
python-beryl, a Python driver for BerylDB.

python-beryl, a Python driver for BerylDB.

BerylDB 3 Nov 24, 2021
Apache Libcloud is a Python library which hides differences between different cloud provider APIs and allows you to manage different cloud resources through a unified and easy to use API

Apache Libcloud - a unified interface for the cloud Apache Libcloud is a Python library which hides differences between different cloud provider APIs

The Apache Software Foundation 1.9k Dec 25, 2022