Make Selenium work on Github Actions

Overview

Make Selenium work on Github Actions

Scraping with BeautifulSoup on GitHub Actions is easy-peasy. But what about Selenium?? After you jump through some hoops it's just as simple.

How to use this

I think you can just fork this repository and get to work on scraper.py.

Otherwise, just steal scraper.py and .github/workflows/scrape.yml and you'll be good to go.

Saving CSV files

If you want every scrape to update a CSV or something like this, you'll need to edit your scrape.yml to commit to the repo after the scraper is done running.

Take a look at the bottom of autoscraper-history to see how to do that. It should be one more step, something like this:

      - name: Commit and push if content changed
        run: |-
          git config user.name "Automated"
          git config user.email "[email protected]"
          git add -A
          timestamp=$(date -u)
          git commit -m "Latest data: ${timestamp}" || exit 0
          git push

I'm pretty sure I lifted that code 100% from Simmon Willison.

Scheduling

Right now the yaml is set to only scrape when you click the "Run workflow" button in the Actions tab, but you can add something like

  schedule:
    - cron: '0 * * * *'

To make it run the first minute of every hour.

How this all works

To make Selenium + GitHub Actions work, this repo does a few magic things. In a normal world, you start Chrome like this:

from selenium import webdriver

driver = webdriver.Chrome()

But we.... do things a little differently. Let me walk you through the changes!

webdriver-manager to manage the webdriver

You can add a special action to set up Chromedriver but I feel it's honestly easier to use webdriver-manager. It magically picks out the right version of chromedriver for you.

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome(ChromeDriverManager().install())

It works on your local machine, too!

Chromium, not Chrome

When you're running GitHub Actions, it's probably on a nice little Ubuntu Linux machine. In those situations, you install software using apt. Since you can't install Chrome with apt, you'll install Chromium instead, the open-source version of Chrome. Works the same, just opens a little differently.

As a result, our chromedriver install code changed a little more:

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from webdriver_manager.utils import ChromeType

driver_path = ChromeDriverManager(chrome_type=ChromeType.CHROMIUM).install()
driver = webdriver.Chrome(driver_path)

This also means we need to add a line that does apt-get install -y chromium-browser to our scrape.yml)

Headless Chromium

Since we don't have a monitor plugged into GitHub Actions, we can't actually see what's going on in the browser. In the olden days you had to construct some odd technical fake screen, but these days you just run in headless mode!

from selenium import webdriver 
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(options=chrome_options)

Other options and more management of things

To combine the webdriver-manager driver path and the headless Chrome option, there are a lot of hoops to jump through. During the research process a lot of extra chrome options poked up, so I thought hey, let's just add all those, too.

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from webdriver_manager.utils import ChromeType
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service

chrome_service = Service(ChromeDriverManager(chrome_type=ChromeType.CHROMIUM).install())

chrome_options = Options()
options = [
    "--headless",
    "--disable-gpu",
    "--window-size=1920,1200",
    "--ignore-certificate-errors",
    "--disable-extensions",
    "--no-sandbox",
    "--disable-dev-shm-usage"
]
for option in options:
    chrome_options.add_argument(option)

driver = webdriver.Chrome(service=chrome_service, options=chrome_options)

The biggest change here beyond all those options is the Service thing. Apparently just giving it the path to chromdriver isn't good enough? Who knows, I just do what works.

Owner
Jonathan Soma
baby data journo wrangler @ledeprogram + @littlecolumns, cat wrangler @cat-republic
Jonathan Soma
Switch among Guest VMs organized by Resource Pool

Proxmox PCI Switcher Switch among Guest VMs organized by Resource Pool. main features: ONE GPU card, N OS (at once) Guest VM command client Handler po

Rosiney Gomes Pereira 111 Dec 27, 2022
Redis fixtures and fixture factories for Pytest.

Redis fixtures and fixture factories for Pytest.This is a pytest plugin, that enables you to test your code that relies on a running Redis database. It allows you to specify additional fixtures for R

Clearcode 86 Dec 23, 2022
Doing dirty (but extremely useful) things with equals.

Doing dirty (but extremely useful) things with equals. Documentation: dirty-equals.helpmanual.io Source Code: github.com/samuelcolvin/dirty-equals dir

Samuel Colvin 602 Jan 05, 2023
Silky smooth profiling for Django

Silk Silk is a live profiling and inspection tool for the Django framework. Silk intercepts and stores HTTP requests and database queries before prese

Jazzband 3.7k Jan 04, 2023
pytest plugin to test mypy static type analysis

pytest-mypy-testing — Plugin to test mypy output with pytest pytest-mypy-testing provides a pytest plugin to test that mypy produces a given output. A

David Fritzsche 21 Dec 21, 2022
fsociety Hacking Tools Pack – A Penetration Testing Framework

Fsociety Hacking Tools Pack A Penetration Testing Framework, you will have every script that a hacker needs. Works with Python 2. For a Python 3 versi

Manisso 8.2k Jan 03, 2023
pytest splinter and selenium integration for anyone interested in browser interaction in tests

Splinter plugin for the pytest runner Install pytest-splinter pip install pytest-splinter Features The plugin provides a set of fixtures to use splin

pytest-dev 238 Nov 14, 2022
The Good Old Days. | Testing Out A New Module-

The-Good-Old-Days. The Good Old Days. | Testing Out A New Module- Installation Asciimatics supports Python versions 2 & 3. For the precise list of tes

Syntax. 2 Jun 08, 2022
A complete test automation tool

Golem - Test Automation Golem is a test framework and a complete tool for browser automation. Tests can be written with code in Python, codeless using

486 Dec 30, 2022
Public repo for automation scripts

Script_Quickies Public repo for automation scripts Dependencies Chrome webdriver .exe (make sure it matches the version of chrome you are using) Selen

CHR-onicles 1 Nov 04, 2021
A grab-bag of nifty pytest plugins

A goody-bag of nifty plugins for pytest OS Build Coverage Plugin Description Supported OS pytest-server-fixtures Extensible server-running framework w

Man Group 492 Jan 03, 2023
Testinfra test your infrastructures

Testinfra test your infrastructure Latest documentation: https://testinfra.readthedocs.io/en/latest About With Testinfra you can write unit tests in P

pytest-dev 2.1k Jan 07, 2023
模仿 USTC CAS 的程序,用于开发校内网站应用的本地调试。

ustc-cas-mock 模仿 USTC CAS 的程序,用于开发校内网站应用阶段调试。 请勿在生产环境部署! 只测试了最常用的三个 CAS route: /login /serviceValidate(验证 CAS ticket) /logout 没有测试过 proxy ticket。(因为我

taoky 4 Jan 27, 2022
pytest plugin for testing mypy types, stubs, and plugins

pytest plugin for testing mypy types, stubs, and plugins Installation This package is available on PyPI pip install pytest-mypy-plugins and conda-forg

TypedDjango 74 Dec 31, 2022
A small faсade for the standard python mocker library to make it user-friendly

unittest-mocker Inspired by the pytest-mock, but written from scratch for using with unittest and convenient tool - patch_class Installation pip insta

Vertliba V.V. 6 Jun 10, 2022
Command line driven CI frontend and development task automation tool.

tox automation project Command line driven CI frontend and development task automation tool At its core tox provides a convenient way to run arbitrary

tox development team 3.1k Jan 04, 2023
buX Course Enrollment Automation

buX automation BRACU - buX course enrollment automation Features: Automatically enroll into multiple courses at a time. Find courses just entering cou

Mohammad Shakib 1 Oct 06, 2022
Ab testing - The using AB test to test of difference of conversion rate

Facebook recently introduced a new type of offer that is an alternative to the current type of bidding called maximum bidding he introduced average bidding.

5 Nov 21, 2022
Free cleverbot without headless browser

Cleverbot Scraper Simple free cleverbot library that doesn't require running a heavy ram wasting headless web browser to actually chat with the bot, a

Matheus Fillipe 3 Sep 25, 2022
A pure Python script to easily get a reverse shell

easy-shell A pure Python script to easily get a reverse shell. How it works? After sending a request, it generates a payload with different commands a

Cristian Souza 48 Dec 12, 2022