This repo is dedicated to the data extraction and manipulation of the World Bank's database called STEP.

Last update: Jan 20, 2022

Related tags

Overview

Welcome to the Step-X repository. This repo is dedicated to the data extraction and manipulation of the World Bank's database called STEP. Bellow in this readme, it will be explained the installation and usage process.

The extractor was created using the following technologies:

Python 3.8
Pandas
Geeckodriver
Selenium
MongoDB

Installation process

To install and prepare the Step-X environment it's necessary to follow these instructions in order, step by step. To start, it's needed to:

Install the Geckodriver
Install the Firefox web browser
Install Anaconda and create an environment to proceed with the next steps (if you wish, you can skip this step)
Install MongoDB in your machine or server

Once installed the required tools describe above, we need to install the Python's libraries used in this project. To make that, execute the command below:

conda create --name 
   
     --file requirements.txt

This command installs the libraries and create a new conda environment. After that, your workspace is prepared to execute the extractor, but you will need to follow some configuration instructions that will be described in the next steps.

Configuration process

To start the extraction, first some configurations is required, such as the World Bank's credentials and the project list that the extractor will retrieve data. Notice that all necessary configuration is imbued in the file called environment.py. To set the World Bank's credentials just replace the variable called wb_credentials with the correct credentials as the example bellow:

wb_credentials = {"email": '[email protected]', 'password': 'password123'}

The geckodriver path is also needed to ensure that the Selenium will be work properly. To set the geckodriver path, just replace the variable geckodriver_path with the desired location:

geckodriver_path = r'/Users/userName/webdriverLocationFolder/geckodriver'

The next step is to set up the database credentials pass name, and the url in environment.py as the example bellow:

database_name = "stepX"
database_url = "localhost"

Finally, for the last configuration, pass the project's list that you wish to extract and manipulate. Follow the example:

PROJECTS_LIST =['PROJECT_ID']

This repo is dedicated to the data extraction and manipulation of the World Bank's database called STEP.

Related tags

Overview

Overview

Installation process

Configuration process

Owner

Keanu Pang

Automatic earthquake catalog building workflow: EQTransformer + Siamese EQTransformer + PickNet + REAL + HypoInverse

pandas: powerful Python data analysis toolkit

PyStan, a Python interface to Stan, a platform for statistical modeling. Documentation: https://pystan.readthedocs.io

Numerical Analysis toolkit centred around PDEs, for demonstration and understanding purposes not production

Using Python to derive insights on particular Pokemon, Types, Generations, and Stats

Calculate multilateral price indices in Python (with Pandas and PySpark).

An Integrated Experimental Platform for time series data anomaly detection.

Minimal working example of data acquisition with nidaqmx python API

Larch: Applications and Python Library for Data Analysis of X-ray Absorption Spectroscopy (XAS, XANES, XAFS, EXAFS), X-ray Fluorescence (XRF) Spectroscopy and Imaging

MeSH2Matrix - A set of Python codes for the generation of biomedical ontologies from the MeSH keywords of the PubMed scholarly publications

First steps with Python in Life Sciences

pyhsmm MITpyhsmm - Bayesian inference in HSMMs and HMMs. MIT

Integrate bus data from a variety of sources (batch processing and real time processing).

Reading streams of Twitter data, save them to Kafka, then process with Kafka Stream API and Spark Streaming

Airflow ETL With EKS EFS Sagemaker

Pandas and Dask test helper methods with beautiful error messages.

vartests is a Python library to perform some statistic tests to evaluate Value at Risk (VaR) Models

A real-time financial data streaming pipeline and visualization platform using Apache Kafka, Cassandra, and Bokeh.

PrimaryBid - Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift

CaterApp is a cross platform, remotely data sharing tool created for sharing files in a quick and secured manner.