Scraping and analysis of leetcode-compensations page.

Overview

Leetcode compensations report

Scraping and analysis of leetcode-compensations page.

Salary Distribution Salary

Report

INDIA : 5th Jan 2019 - 5th Aug 2021 / fixed salary

INDIA : 5th Jan 2019 - 5th Aug 2021 / fixed salary, dark mode

INDIA : 5th Jan 2019 - 5th Aug 2021 / total salary

INDIA : 5th Jan 2019 - 5th Aug 2021 / total salary, dark mode

Directory structure

  • data
    • imgs - images for reports
    • logs - scraping logs
    • mappings - standardized company, location and title mappings as well as unmapped entities
    • meta - meta information for the posts like post_id, date, title, href.
    • out - data from info.all_info.get_clean_records_for_india()
    • posts - text from the post
    • reports - salary analysis by companies, titles and experience
  • info - functions to posts data(along with the standardized entities) in a tabular format
  • leetcode - scraper
  • utils - constants and helper methods

Setup

  1. Clone the repo.
  2. Put the chromedriver in the utils directory.
  3. Setup virual enviroment python -m venv leetcode.
  4. Install necessary packages pip install -r requirements.txt.
  5. To create the reports npm install vega-lite vega-cli canvas(needed to save altair plots).

Scraping

$ export PTYHONPATH=<project_directory>
$ python leetcode/posts_meta.py --till_date 2021/08/03

# sample output
2021-08-03 19:36:07.474 | INFO     | __main__:<module>:48 - page no: 1 | # posts: 15
$ python leetcode/posts.py

# sample output
2021-08-03 19:36:25.997 | INFO     | __main__:<module>:45 - post_id: 1380805 done!
2021-08-03 19:36:28.995 | INFO     | __main__:<module>:45 - post_id: 1380646 done!
2021-08-03 19:36:31.631 | INFO     | __main__:<module>:45 - post_id: 1380542 done!
2021-08-03 19:36:34.727 | INFO     | __main__:<module>:45 - post_id: 1380068 done!
2021-08-03 19:36:37.280 | INFO     | __main__:<module>:45 - post_id: 1379990 done!
2021-08-03 19:36:40.509 | INFO     | __main__:<module>:45 - post_id: 1379903 done!
2021-08-03 19:36:41.096 | WARNING  | __main__:<module>:34 - sleeping extra for post_id: 1379487
2021-08-03 19:36:44.530 | INFO     | __main__:<module>:45 - post_id: 1379487 done!
2021-08-03 19:36:47.115 | INFO     | __main__:<module>:45 - post_id: 1379208 done!
2021-08-03 19:36:49.660 | INFO     | __main__:<module>:45 - post_id: 1378689 done!
2021-08-03 19:36:50.470 | WARNING  | __main__:<module>:34 - sleeping extra for post_id: 1378620
2021-08-03 19:36:53.866 | INFO     | __main__:<module>:45 - post_id: 1378620 done!
2021-08-03 19:36:57.203 | INFO     | __main__:<module>:45 - post_id: 1378334 done!
2021-08-03 19:37:00.570 | INFO     | __main__:<module>:45 - post_id: 1378288 done!
2021-08-03 19:37:03.226 | INFO     | __main__:<module>:45 - post_id: 1378181 done!
2021-08-03 19:37:05.895 | INFO     | __main__:<module>:45 - post_id: 1378113 done!

Report DataFrame

$ ipython

In [1]: from info.all_info import get_clean_records_for_india                                                               
In [2]: df = get_clean_records_for_india()                                                                                  
2021-08-04 15:47:11.615 | INFO     | info.all_info:get_raw_records:95 - n records: 4134
2021-08-04 15:47:11.616 | WARNING  | info.all_info:get_raw_records:97 - missing post_ids: ['1347044', '1193859', '1208031', '1352074', '1308645', '1206533', '1309603', '1308672', '1271172', '214751', '1317751', '1342147', '1308728', '1138584']
2021-08-04 15:47:11.696 | WARNING  | info.all_info:_save_unmapped_labels:54 - 35 unmapped company saved
2021-08-04 15:47:11.705 | WARNING  | info.all_info:_save_unmapped_labels:54 - 353 unmapped title saved
2021-08-04 15:47:11.708 | WARNING  | info.all_info:get_clean_records_for_india:122 - 1779 rows dropped(location!=india)
2021-08-04 15:47:11.709 | WARNING  | info.all_info:get_clean_records_for_india:128 - 385 rows dropped(incomplete info)
2021-08-04 15:47:11.710 | WARNING  | info.all_info:get_clean_records_for_india:134 - 7 rows dropped(internships)
In [3]: df.shape                                                                                                            
Out[3]: (1963, 14)

Report

$ python reports/plots.py # generate fixed comp. plots
$ python reports/report.py # fixed comp.
$ python reports/report_dark.py # fixed comp., dark mode

$ python reports/plots_tc.py # generate total comp. plots
$ python reports/report_tc.py # total comp.
$ python reports/report_dark.py # total comp., dark mode

Samples

title : Flipkart | Software Development Engineer-1 | Bangalore
url : https://leetcode.com/discuss/compensation/834212/Flipkart-or-Software-Development-Engineer-1-or-Bangalore
company : flipkart
title : sde 1
yoe : 0.0 years
salary : ₹ 1800000.0
location : bangalore
post Education: B.Tech from NIT (2021 passout) Years of Experience: 0 Prior Experience: Fresher Date of the Offer: Aug 2020 Company: Flipkart Title/Level: Software Development Engineer-1 Location: Bangalore Salary: INR 18,00,000 Performance Incentive: INR 1,80,000 (10% of base pay) ESOPs: 48 units => INR 5,07,734 (vested over 4 years. 25% each year) Relocation Reimbursement: INR 40,000 Telephone Reimbursement: INR 12,000 Home Broadband Reimbursement: INR 12,000 Gratuity: INR 38,961 Insurance: INR 27,000 Other Benefits: INR 40,000 (15 days accomodation + travel) (this is different from the relocation reimbursement) Total comp (Salary + Bonus + Stock): Total CTC: INR 26,57,695; First year: INR 22,76,895 Other details: Standard Offer for On-Campus Hire Allowed Branches: B.Tech CSE/IT (6.0 CGPA & above) Process consisted of Coding test & 3 rounds of interviews. I don't remember questions exactly. But they vary from topics such as Graph(Topological Sort, Bi-Partite Graph), Trie based questions, DP based questions both recursive and dp approach, trees, Backtracking.

title : Cloudera | SSE | Bangalore | 2019
url : https://leetcode.com/discuss/compensation/388432/Cloudera-or-SSE-or-Bangalore-or-2019
company : cloudera
title : sde 2
yoe : 2.5 years
salary : ₹ 2800000.0
location : bangalore
post Education: MTech from Tier 1 College Years of Experience: 2.5 Prior Experience: SDE at Flipkart Date of the Offer: Sept 10, 2019 Company: Cloudera Title/Level: Senior Software Engineer (SSE) Location: Bangalore, India Salary: Rs 28,00,000 Bonus: Rs 2,80,000 (10 % of base) PF & Gratuity: Rs 1,88,272 Stock bonus: 5000 units over 4 years ($9 per unit) Other Benefits: Rs 4,00,000 (Health, Term Life and Personal Accident Insurance, Annual Medical Health Checkup, Transportation, Education Reimbursement) Total comp (Salary + Bonus + Stock): Rs 4070572

title : Amadeus Labs | MTS | Bengaluru
url : https://leetcode.com/discuss/compensation/1109046/Amadeus-Labs-or-MTS-or-Bengaluru
company : amadeus labs
title : mts 1
yoe : 7.0 years
salary : ₹ 1700000.0
location : bangalore
post Education: B.Tech. in ECE Years of Experience: 7 Prior Experience: Worked at few MNCs Date of the Offer: Jan 2021 Company: Amadeus Labs Title/Level: Member of Technical Staff Location: Bengaluru, India Salary: ₹ 1,700,000 Signing Bonus: ₹ 50,000 Stock bonus: None Bonus: 137,000 Total comp (Salary + Bonus + Stock): ~₹1,887,000 Benefits: Employee and family Insurance

Owner
utsav
Lead MLE @ freshworks
utsav
The micro-framework to create dataframes from functions.

The micro-framework to create dataframes from functions.

Stitch Fix Technology 762 Jan 07, 2023
Picka: A Python module for data generation and randomization.

Picka: A Python module for data generation and randomization. Author: Anthony Long Version: 1.0.1 - Fixed the broken image stuff. Whoops What is Picka

Anthony 108 Nov 30, 2021
A forecasting system dedicated to smart city data

smart-city-predictions System prognostyczny dedykowany dla danych inteligentnych miast Praca inżynierska realizowana przez Michała Stawikowskiego and

Kevin Lai 1 Nov 08, 2021
Desafio proposto pela IGTI em seu bootcamp de Cloud Data Engineer

Desafio Modulo 4 - Cloud Data Engineer Bootcamp - IGTI Objetivos Criar infraestrutura como código Utuilizando um cluster Kubernetes na Azure Ingestão

Otacilio Filho 4 Jan 23, 2022
:truck: Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark

To launch a live notebook server to test optimus using binder or Colab, click on one of the following badges: Optimus is the missing framework to prof

Iron 1.3k Dec 30, 2022
Incubator for useful bioinformatics code, primarily in Python and R

Collection of useful code related to biological analysis. Much of this is discussed with examples at Blue collar bioinformatics. All code, images and

Brad Chapman 560 Jan 03, 2023
Numerical Analysis toolkit centred around PDEs, for demonstration and understanding purposes not production

Numerics Numerical Analysis toolkit centred around PDEs, for demonstration and understanding purposes not production Use procedure: Initialise a new i

George Whittle 1 Nov 13, 2021
A model checker for verifying properties in epistemic models

Epistemic Model Checker This is a model checker for verifying properties in epistemic models. The goal of the model checker is to check for Pluralisti

Thomas Träff 2 Dec 22, 2021
ASOUL直播间弹幕抓取&&数据分析

ASOUL直播间弹幕抓取&&数据分析(更新中) 这些文件用于爬取ASOUL直播间的弹幕(其他直播间也可以)和其他信息,以及简单的数据分析生成。

159 Dec 10, 2022
INF42 - Topological Data Analysis

TDA INF421(Conception et analyse d'algorithmes) Projet : Topological Data Analysis SphereMin Etant donné un nuage des points, ce programme contient de

2 Jan 07, 2022
WAL enables programmable waveform analysis.

This repro introcudes the Waveform Analysis Language (WAL). The initial paper on WAL will appear at ASPDAC'22 and can be downloaded here: https://www.

Institute for Complex Systems (ICS), Johannes Kepler University Linz 40 Dec 13, 2022
ICLR 2022 Paper submission trend analysis

Visualize ICLR 2022 OpenReview Data

Jintang Li 75 Dec 06, 2022
A Python package for modular causal inference analysis and model evaluations

Causal Inference 360 A Python package for inferring causal effects from observational data. Description Causal inference analysis enables estimating t

International Business Machines 506 Dec 19, 2022
Detecting Underwater Objects (DUO)

Underwater object detection for robot picking has attracted a lot of interest. However, it is still an unsolved problem due to several challenges. We take steps towards making it more realistic by ad

27 Dec 12, 2022
For making Tagtog annotation into csv dataset

tagtog_relation_extraction for making Tagtog annotation into csv dataset How to Use On Tagtog 1. Go to Project Downloads 2. Download all documents,

hyeong 4 Dec 28, 2021
High Dimensional Portfolio Selection with Cardinality Constraints

High-Dimensional Portfolio Selecton with Cardinality Constraints This repo contains code for perform proximal gradient descent to solve sample average

Du Jinhong 2 Mar 22, 2022
Analysiscsv.py for extracting analysis and exporting as CSV

wcc_analysis Lichess page documentation: https://lichess.org/page/world-championships Each WCC has a study, studies are fetched using: https://lichess

32 Apr 25, 2022
Wafer Fault Detection - Wafer circleci with python

Wafer Fault Detection Problem Statement: Wafer (In electronics), also called a slice or substrate, is a thin slice of semiconductor, such as a crystal

Avnish Yadav 14 Nov 21, 2022
The Spark Challenge Student Check-In/Out Tracking Script

The Spark Challenge Student Check-In/Out Tracking Script This Python Script uses the Student ID Database to match the entries with the ID Card Swipe a

1 Dec 09, 2021
Data Scientist in Simple Stock Analysis of PT Bukalapak.com Tbk for Long Term Investment

Data Scientist in Simple Stock Analysis of PT Bukalapak.com Tbk for Long Term Investment Brief explanation of PT Bukalapak.com Tbk Bukalapak was found

Najibulloh Asror 2 Feb 10, 2022