This is python to scrape overview and reviews of companies from Glassdoor.

Last update: Jun 23, 2022

Related tags

Overview

Data Scraping for Glassdoor

This is python to scrape overview and reviews of companies from Glassdoor. Please use it carefully and follow the Terms of Service that explicitly prohibits web scraping.

Built With

Python
ChromeDriver

(back to top)

Getting Started

Download the SeleniumGlassdor.py file. Change the path of the chromedriver on your machine. Use your own file that contain the lists of the companies glassdoor url. The company url csv file is also attached here. The way to generate the file is also based on selenium, searching the 'glassdoor' + company name in google search engine, and extract the url from the first results. Per requests, I can also upload the file accordingly.

Prerequisites

Install the selenium before using it.

selenium
```
pip install selenium
```

For the other sections

If you want to scape data from the other sections, such as jobs, salaries. You can use the following methods to first extract the url and then use the similar method to downlode the sections.

reviewsUrl = browser.find_element_by_xpath("//a[@data-label='Reviews']").get_attribute('href')
jobsUrl = browser.find_element_by_xpath("//a[@data-label='Jobs']").get_attribute('href')
salariesUrl = browser.find_element_by_xpath("//a[@data-label='Salaries']").get_attribute('href')
interviewsUrl = browser.find_element_by_xpath("//a[@data-label='Interviews']").get_attribute('href')
benefitsUrl = browser.find_element_by_xpath("//a[@data-label='Benefits']").get_attribute('href')
photosUrl = browser.find_element_by_xpath("//a[@data-label='Photos']").get_attribute('href')

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

Houping - [email protected]

(back to top)

This is python to scrape overview and reviews of companies from Glassdoor.

Related tags

Overview

Data Scraping for Glassdoor

Built With

Getting Started

Prerequisites

For the other sections

Contributing

License

Contact

Owner

Houping

DaProfiler allows you to get emails, social medias, adresses, works and more on your target using web scraping and google dorking techniques

A high-level distributed crawling framework.

A Python library for automating interaction with websites.

A module for CME that spiders hashes across the domain with a given hash.

A Powerful Spider(Web Crawler) System in Python.

This was supposed to be a web scraping project, but somehow I've turned it into a spamming project

Video Games Web Scraper is a project that crawls websites and APIs and extracts video game related data from their pages.

Scrapping the data from each page of biocides listed on the BAUA website into a csv file

A leetcode scraper to compile all questions in leetcode free tier to text file. pdf also available.

A scrapy pipeline that provides an easy way to store files and images using various folder structures.

Telegram group scraper tool

A Very simple free proxy list scraper.

Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo.

Web3 Pancakeswap Sniper bot written in python3

此脚本为 python 脚本,实现原理为利用 selenium 定位相关元素,再配合点击事件完成浏览器的自动化.

CreamySoup - a helper script for automated SourceMod plugin updates management.

Raspi-scraper is a configurable python webscraper that checks raspberry pi stocks from verified sellers

A dead simple crawler to get books information from Douban.

🤖 Threaded Scraper to get discord servers from disboard.org written in python3

A spider for Universal Online Judge(UOJ) system, converting problem pages to PDFs.