Linkedin webscraping - Linkedin web scraping with python

Last update: Apr 24, 2022

Related tags

Overview

linkedin_webscraping

This is the first step of a full project called "LinkedIn Job Posting Analysis" and consists of a data ingestion (Extract and Load) procedure to retrieve information about jobs requirements in the data fields (Data Science, Data Engineering, Data Analysis, etc).

I started by navigating through the LinkedIn jobs page and searching for the desired job keyword using Selenium. After I found a good amount of jobs, I used the BeautifulSoup library to inspect the page and get, from each announced job, the full link for that post. This is our first function, get_links.

Then, looping through that list and using BeautifulSoup I was able to get the Job Title, Company Name, Job Location and Job Description for each job link. After some filtering on the Descriptions list, the data retrieved was put on a dictionary and turned into a Pandas DataFrame. This is our second function, jobs_dataframe, and it returns something like this:

Finally, after some small validation, the data is ready to be stored into a database. For this, I created a SQLite connection and a table using the sqlalchemy library to write SQL in Python. We can see the results in the picture below:

Despite we're already able to make some Data Analysis and maybe some Machine Learning using the data we have, I want to stress that this is an ongoing project for some reasons:

First, I want to migrate these data from SQLite to a PostgreSQL database (so I can have more freedom to edit it) and create relational tables, using an efficient way to relate them;
Second, maybe is it possible to refine a little bit more the description column and normalize all the table;
Last but not least, this is just the first step of a bigger project, as I said earlier. So, we'll probably gonna make a lot of changes along the way, even though we may still use the EtLT pattern to do the engineering.

Dependencies

This project was made using Python 3.10.0

Executing

To run this project, in addition to Python, you'll need to have ChromeDriver and SQLite and its libraries for Python installed on your computer or on a virtual environment and chromedriver.exe on your project's folder. Then, run the linkedin_scraper.py file on your terminal window. Next, open the scraping_jobs notebook and substitute the keyword string of your interest on the job_keyword variable. Finally, run all cells and you're ready to open, on your database administration tool (mine's DBeaver), the data you've just got.

Author

Pedro Dib ([email protected])

Thanks

Thanks a lot to Igor Magalhães for the project idea, and for helping me with tips on writing good code and best practices on documentation.

Linkedin webscraping - Linkedin web scraping with python

Related tags

Overview

linkedin_webscraping

Dependencies

Executing

Author

Thanks

Owner

Pedro Dib

A web scraper that exports your entire WhatsApp chat history.

This program scrapes information and images for movies and TV shows.

A python tool to scrape NFT's off of OpenSea

Simply scrape / download all the media from an fansly account.

TikTok Username Swapper/Claimer/etc

Download images from forum threads

Snowflake database loading utility with Scrapy integration

UsernameScraperTool - Username Scraper Tool With Python

python+selenium实现的web端自动打卡 + 每日邮件发送 + 金山词霸每日一句 + 毒鸡汤（从2月份稳定运行至今）

Discord webhook spammer with proxy support and proxy scraper

Raspi-scraper is a configurable python webscraper that checks raspberry pi stocks from verified sellers

学习强国自动化百分百正确、瞬间答题，分值45分

A simple code to fetch comments below an Instagram post and save them to a csv file

Free-Game-Scraper is a useful script that allows you to track down free games and DLCs on many platforms.

淘宝茅台抢购最新优化版本，淘宝茅台秒杀，优化了茅台抢购线程队列

一些爬虫相关的签名、验证码破解

EBay-email-tracker - Scapes an entire search page of a particular item on eBay and sends regular updates to an email address

A web scraper for nomadlist.com, made to avoid website restrictions.

This is a webscraper for a specific website

A Python Oriented tool to Scrap WhatsApp Group Link using Google Dork it Scraps Whatsapp Group Links From Google Results And Gives Working Links.

Linkedin webscraping - Linkedin web scraping with python

Related tags

Overview

linkedin_webscraping

Dependencies

Executing

Author

Thanks

Owner

Pedro Dib

A web scraper that exports your entire WhatsApp chat history.

This program scrapes information and images for movies and TV shows.

A python tool to scrape NFT's off of OpenSea

Simply scrape / download all the media from an fansly account.

TikTok Username Swapper/Claimer/etc

Download images from forum threads

Snowflake database loading utility with Scrapy integration

UsernameScraperTool - Username Scraper Tool With Python

python+selenium实现的web端自动打卡 + 每日邮件发送 + 金山词霸 每日一句 + 毒鸡汤（从2月份稳定运行至今）

Discord webhook spammer with proxy support and proxy scraper

Raspi-scraper is a configurable python webscraper that checks raspberry pi stocks from verified sellers

学习强国 自动化 百分百正确、瞬间答题，分值45分

A simple code to fetch comments below an Instagram post and save them to a csv file

Free-Game-Scraper is a useful script that allows you to track down free games and DLCs on many platforms.

淘宝茅台抢购最新优化版本，淘宝茅台秒杀，优化了茅台抢购线程队列

一些爬虫相关的签名、验证码破解

EBay-email-tracker - Scapes an entire search page of a particular item on eBay and sends regular updates to an email address

A web scraper for nomadlist.com, made to avoid website restrictions.

This is a webscraper for a specific website

A Python Oriented tool to Scrap WhatsApp Group Link using Google Dork it Scraps Whatsapp Group Links From Google Results And Gives Working Links.

python+selenium实现的web端自动打卡 + 每日邮件发送 + 金山词霸每日一句 + 毒鸡汤（从2月份稳定运行至今）

学习强国自动化百分百正确、瞬间答题，分值45分