Scrapping Connections' info on Linkedin

Overview

Scrap It!

! Disclaimer:

  1. THIS CODE HAS BEEN IMPLEMENTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE INTERVIEW PROCESS OF MCI.IR AND INTERVIEWEES WERE SUPPOSED TO PUSH THE CODE ON THEIR GITHUB. CONTACT ME TO REMOVE THIS REPOSITORY, IN CASE IT IS AGAINST YOUR TOS.
  2. IF ANY CONNECTION IS NOT OK TO THEIR CONTACT INFO BE HERE, CONTACT ME TO REMOVE THEM ASAP.

Functionalities:

This script automatically:

  • opens your Linkedin profile
  • accesses your connections page
  • crawls the page for grabbing their profile links
  • scraps each person's information and dumps it to Sqlite db
  • and simultaneously logs all necessary level of info into Linkedin.log

DataFlowDiagram

db

Enlisted desing patterns are (but not limited to):

  • Creator
  • Low Coupling
  • High Cohesion
  • Indirection
  • Modularization
  • Information Expert

Log/DB files:

db log log2

Further develepments notes:

  • Check out other DBs that supports multithreading which anable us dumpping all information rows at once
  • change IP per request (You can find its code on my "Social Media Computing course" repository)
  • Sometimes you need to scroll down manually when "connection" page is being loaded. You can add one line code to scroll down for you.

References:

https://www.linkedin.com/pulse/how-easy-scraping-data-from-linkedin-profiles-david-craven

https://www.geeksforgeeks.org/scrape-linkedin-using-selenium-and-beautiful-soup-in-python/

https://stackoverflow.com/questions/28883769/remove-odd-indexed-elements-from-list-in-python#:~:text=Fun%20fact%3A%20to%20remove%20all,remove(x)%20.

https://stackoverflow.com/questions/34759787/fetch-all-href-link-using-selenium-in-python

https://www.tutorialspoint.com/fetch-all-href-link-using-selenium-in-python

https://stackoverflow.com/questions/64717302/deprecationwarning-executable-path-has-been-deprecated-selenium-python

https://chromedriver.chromium.org/home

https://www.youtube.com/watch?v=-ARI4Cz-awo

Owner
MohammadReza Ardestani
CS @ AUT
MohammadReza Ardestani
Command line program to download documents from web portals.

command line document download made easy Highlights list available documents in json format or download them filter documents using string matching re

16 Dec 26, 2022
Discord webhook spammer with proxy support and proxy scraper

Discord webhook spammer with proxy support and proxy scraper

3 Feb 27, 2022
A Powerful Spider(Web Crawler) System in Python.

pyspider A Powerful Spider(Web Crawler) System in Python. Write script in Python Powerful WebUI with script editor, task monitor, project manager and

Roy Binux 15.7k Jan 04, 2023
12306抢票脚本

12306抢票脚本

罐子里的茶 457 Jan 05, 2023
for those who dont want to pay $10/month for high school game footage with ads

nfhs-scraper Disclaimer: I am in no way responsible for what you choose to do with this script and guide. I do not endorse avoiding paywalls or any il

Conrad Crawford 5 Apr 12, 2022
ChromiumJniGenerator - Jni Generator module extracted from Chromium project

ChromiumJniGenerator - Jni Generator module extracted from Chromium project

allenxuan 4 Jun 12, 2022
Pythonic Crawling / Scraping Framework based on Non Blocking I/O operations.

Pythonic Crawling / Scraping Framework Built on Eventlet Features High Speed WebCrawler built on Eventlet. Supports relational databases engines like

Juan Manuel Garcia 173 Dec 05, 2022
Python Web Scrapper Project

Web Scrapper Projeto desenvolvido em python, sobre tudo com Selenium, BeautifulSoup e Pandas é um web scrapper que puxa uma tabela com as principais e

Jordan Ítalo Amaral 2 Jan 04, 2022
Tool to scan for secret files on HTTP servers

snallygaster Finds file leaks and other security problems on HTTP servers. what? snallygaster is a tool that looks for files accessible on web servers

Hanno Böck 2k Dec 28, 2022
淘宝、天猫半价抢购,抢电视、抢茅台,干死黄牛党

taobao_seckill 淘宝、天猫半价抢购,抢电视、抢茅台,干死黄牛党 依赖 安装chrome浏览器,根据浏览器的版本找到对应的chromedriver下载安装 web版使用说明 1、抢购前需要校准本地时间,然后把需要抢购的商品加入购物车 2、如果要打包成可执行文件,可使用pyinstalle

2k Jan 05, 2023
淘宝茅台抢购最新优化版本,淘宝茅台秒杀,优化了茅台抢购线程队列

淘宝茅台抢购最新优化版本,淘宝茅台秒杀,优化了茅台抢购线程队列

MaoTai 118 Dec 16, 2022
京东云无线宝积分推送,支持查看多设备积分使用情况

JDRouterPush 项目简介 本项目调用京东云无线宝API,可每天定时推送积分收益情况,帮助你更好的观察主要信息 更新日志 2021-03-02: 查询绑定的京东账户 通知排版优化 脚本检测更新 支持Server酱Turbo版 2021-02-25: 实现多设备查询 查询今

雷疯 199 Dec 12, 2022
Audio media crawler for lbry.

Audio media crawler for lbry. Requirements Python 3.8 Poetry 1.1.7 Elasticsearch 7.14.0 Lbry-sdk 0.99.0 Development This project uses poetry as a depe

Hound.fm 4 Dec 03, 2022
Html Content / Article Extractor, web scrapping lib in Python

Python-Goose - Article Extractor Intro Goose was originally an article extractor written in Java that has most recently (Aug2011) been converted to a

Xavier Grangier 3.8k Jan 02, 2023
A web service for scanning media hosted by a Matrix media repository

Matrix Content Scanner A web service for scanning media hosted by a Matrix media repository Installation TODO Development In a virtual environment wit

Brendan Abolivier 5 Dec 01, 2022
A tool to easily scrape youtube data using the Google API

YouTube data scraper To easily scrape any data from the youtube homepage, a youtube channel/user, search results, playlists, and a single video itself

7 Dec 03, 2022
A python script to extract answers to any question on Quora (Quora+ included)

quora-plus-bypass A python script to extract answers to any question on Quora (Quora+ included) Requirements Python 3.x

Nitin Narayanan 10 Aug 18, 2022
Scraping followers of an instagram account

ScrapInsta A script to scraping data from Instagram Install First of all you can run: pip install scrapinsta After that you need to install these requ

Matheus Kolln 1 Sep 05, 2021
Linkedin webscraping - Linkedin web scraping with python

linkedin_webscraping This is the first step of a full project called "LinkedIn J

Pedro Dib 4 Apr 24, 2022
This is a web crawler that works on employ email data by gmane.org and visualizes it in different ways.

crawler_to_visual_gmane Analyzing an EMAIL Archive from gmane and vizualizing the data using the D3 JavaScript library. This is a set of tools that al

Saim Zafar 1 Dec 20, 2021