Python programming language Test

Overview

Exercise

You are tasked with creating a data-processing app that pre-processes and enriches the data coming from crawlers, with the following requirements.

  • INPUT: csv-like data submitted by crawlers
  • OUTPUT: clean data saved into mongodb collections
  1. The app is an HTTP API server. Every year, crawlers will submit the data saved in a file, using the API endpoint designed by you.
  2. Examples of data the crawlers will submit every year: see data-2018.txt, data-2019.txt, data-2020.txt. You can't change the format of the data.
  3. As you can see, the data coming from crawlers is not 100% well-structured, the API should parse it correctly.
  4. a repeated submission with the data of the same year should perform an update on the existing yearly data.
  5. if there is any error in the submission or processing, the API should return a proper error message with proper HTTP response status
  6. For each university, enrich the data with a URL and a text description of it using Duckduckgo API e.g. https://api.duckduckgo.com/?q=harvard&format=json&pretty=1
  7. The app inserts or updates clean data in 2 mongodb tables/collections:
  • table 1 - the yearly data table
  • table 2 - the universities info
  1. table 1 contains data from every year, table 2 contains only the latest data.
  2. the data processing and transformations should be covered by tests.
  3. The solution should be in Python programming language, however you may use any 3rd party library you like.

Feel free to clarify the requirements further, if you have any doubts.

Bonus (Optional)

If you have indicated any DevOps skillsets in your resume, please create a Dockerfile, and using docker deploy the web app onto a free cloud platform, such as Heroku.

How the solution is assessed

The criteria are as follows (descending importance)

  1. Your code should perform the functionalities required
  2. Your code should be well-covered by tests
  3. Your code should be modular, readable and maintenable by other engineers.
  4. Your code should be robust, and can handle failure such as missing field, disconnection from DB or external server.
  5. Your code should be efficient and fast.
  6. Your code should be pretty.
Owner
Monirul Islam Khan
Database Engineer | DBA | Data Analyst | Data Scientist | Dev Ops
Monirul Islam Khan
Example code for the book Fluent Python, 1st Edition (O'Reilly, 2015)

Fluent Python, First Edition: example code This repository is archived and will not be updated.

Fluent Python 5.4k Jan 09, 2023
Short, introductory guide for the Python programming language

100 Page Python Intro This book is a short, introductory guide for the Python programming language.

Sundeep Agarwal 185 Dec 26, 2022
NGEBUG is a tool that sends viruses to victims

Ngebug NGEBUG adalah tools pengirim virus ke korban NGEBUG adalah tools virus terbaru yang berasal dari rusia Informasi lengkap ada didalam tools Run

Profesor Acc 3 Dec 13, 2021
Repositório contendo atividades no curso de desenvolvimento de sistemas no SENAI

SENAI-DES Este é um repositório contendo as atividades relacionadas ao curso de desenvolvimento de sistemas no SENAI. Se é a primeira vez em contato c

Abe Hidek 4 Dec 06, 2022
BMI-Calculator: Program to Calculate Body Mass Index (BMI)

The Body Mass Index (BMI) or Quetelet index is a value derived from the mass (weight) and height of an individual, male or female.

PyLaboratory 0 Feb 07, 2022
Paintbot - Forward & Inverse Kinematics

PAINTBOT - FORWARD & INVERSE KINEMATICS: Overview: We built a simulation of a RRR robot shown in the figure below. The robot has 3 links and is connec

Alex Lin 1 Oct 21, 2021
TurtleBot Control App - TurtleBot Control App With Python

TURTLEBOT CONTROL APP INDEX: 1. Introduction 2. Environments 2.1. Simulated Envi

Rafanton 4 Aug 03, 2022
A gamey, snakey esoteric programming language

Snak Snak is an esolang based on the classic snake game. Installation You will need python3. To use the visualizer, you will need the curses module. T

David Rutter 3 Oct 10, 2022
Collection of Beginner to Intermediate level Python scripts contributed by members and participants.

Hacktoberfest2021-Python Hello there! This repository contains a 'Collection of Beginner to Intermediate level Python projects', created specially for

12 May 25, 2022
Better Giveaways is a bot that will change the experience of using a giveaway bot forever.

Better-Giveaways Better Giveaways is a bot that will change the experience of using a giveaway bot forever. VoxelBotUtils/Novus, latest PyPi releases

Lightning 2 Jan 12, 2022
WATTS provides a set of Python classes that can manage simulation workflows for multiple codes where information is exchanged at a coarse level

WATTS (Workflow and Template Toolkit for Simulation) provides a set of Python classes that can manage simulation workflows for multiple codes where information is exchanged at a coarse level.

13 Dec 23, 2022
Pokehandy - Data web app sobre Pokémon TCG que desarrollo durante transmisiones de Twitch, 2022

⚡️ Pokéhandy – Pokémon Hand Simulator [WIP 🚧 ] This application aims to simulat

Rodolfo Ferro 5 Feb 23, 2022
Table (Finnish Taulukko) glued together to transform into hands-free living.

taulukko Table (Finnish Taulukko) glued together to transform into hands-free living. Installation Preferred way to install is as usual (for testing o

Stefan Hagen 2 Dec 14, 2022
Interfaces between napari and pymeshlab library to allow import, export and construction of surfaces.

napari-pymeshlab Interfaces between napari and the pymeshlab library to allow import, export and construction of surfaces. This is a WIP and feature r

Zach Marin 4 Oct 12, 2022
This is an independent project to track Nubank expenses

Nubank expense tracker This is an independent project to track Nubank expenses. To fetch Nubank data we are going to use an unofficial Nubank API, tha

Ramon Gazoni Lacerda 0 Aug 28, 2022
ELF file deserializer and serializer library

elfo ELF file deserializer and serializer library. import elfo elf = elfo.ELF.from_path('main') elf ELF( header=ELFHeader( e_ident=e

Filipe Laíns 3 Aug 23, 2021
Python interface to IEX and IEX cloud APIs

Python interface to IEX Cloud Referral Please subscribe to IEX Cloud using this referral code. Getting Started Install Install from pip pip install py

IEX Cloud 41 Dec 21, 2022
Get you an ultimate lexer generator using Fable; port OCaml sedlex to FSharp, Python and more!

NOTE: currently we support interpreted mode and Python source code generation. It's EASY to compile compiled_unit into source code for C#, F# and othe

Taine Zhao 15 Aug 06, 2022
A simple armature retargeting tool for Blender

Simple-Retarget-Tool-Blender A simple armature retargeting tool for Blender Update V2: Set Rest Pose to easily apply rest pose. Preset Import/Export.

Fahad Hasan Pathik 74 Jan 04, 2023
Ingestinator is my personal VFX pipeline tool for ingesting folders containing frame sequences that have been pulled and downloaded to a local folder

Ingestinator Ingestinator is my personal VFX pipeline tool for ingesting folders containing frame sequences that have been pulled and downloaded to a

Henry Wilkinson 2 Nov 18, 2022