Python tutorial for implementing Oxylabs' Residential Proxies with AIOHTTP

Overview

Integrating Oxylabs' Residential Proxies with AIOHTTP

Requirements for the Integration

For the integration to work you'll need to install aiohttp library, use Python 3.6 version or higher and Residential Proxies.
If you don't have aiohttp library, you can install it by using pip command:

pip install aiohttp

You can get Residential Proxies here: https://oxylabs.io/products/residential-proxy-pool

Proxy Authentication

There are 2 ways to authenticate proxies with aiohttp.
The first way is to authorize and pass credentials along with the proxy URL using aiohttp.BasicAuth:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"
 
async def fetch():
    async with aiohttp.ClientSession() as session:
        proxy_auth = aiohttp.BasicAuth(USER, PASSWORD)
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy="http://pr.oxylabs.io:7777", 
                proxy_auth=proxy_auth ,
        ) as resp:
            print(await resp.text())

The second one is by passing authentication credentials in proxy URL:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

async def fetch():
    async with aiohttp.ClientSession() as session:
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as resp: 
            print(await resp.text())

In order to use your own proxies, adjust user and pass fields with your Oxylabs account credentials.

Testing Proxies

To see if the proxy is working, try visiting https://ip.oxylabs.io. If everything is working correctly, it will return an IP address of a proxy that you're currently using.

Sample Project: Extracting Data From Multiple Pages

To better understand how residential proxies can be utilized for asynchronous data extracting operations, we wrote a sample project to scrape product listing data and save the output to a CSV file. The proxy rotation allows us to send multiple requests at once risk-free – meaning that we don't need to worry about CAPTCHA or getting blocked. This makes the web scraping process extremely fast and efficient – now you can extract data from thousands of products in a matter of seconds!

li > article.product_pod"): data = { "title": product_data.select_one("h3 > a")["title"], "url": product_data.select_one("h3 > a").get("href")[5:], "product_price": product_data.select_one("p.price_color").text, "stars": product_data.select_one("p")["class"][1], } results_list.append(data) # Fill results_list by reference. print(f"Extracted data for a book: {data['title']}") async def fetch(session, sem, url, results_list): async with sem: async with session.get( url, proxy=f"http://{USER}:{PASSWORD}@{END_POINT}", ) as response: await parse_data(await response.text(), results_list) async def create_jobs(results_list): sem = asyncio.Semaphore(4) async with aiohttp.ClientSession() as session: await asyncio.gather( *[fetch(session, sem, url, results_list) for url in url_list] ) if __name__ == "__main__": results = [] start = time.perf_counter() # Different EventLoopPolicy must be loaded if you're using Windows OS. # This helps to avoid "Event Loop is closed" error. if sys.platform.startswith("win") and sys.version_info.minor >= 8: asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) try: asyncio.run(create_jobs(results)) except Exception as e: print(e) print("We broke, but there might still be some results") print( f"\nTotal of {len(results)} products from {len(url_list)} pages " f"gathered in {time.perf_counter() - start:.2f} seconds.", ) df = pd.DataFrame(results) df["url"] = df["url"].map( lambda x: "".join(["https://books.toscrape.com/catalogue", x]) ) filename = "scraped-books.csv" df.to_csv(filename, encoding="utf-8-sig", index=False) print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}") ">
import asyncio
import time
import sys
import os

import aiohttp
import pandas as pd
from bs4 import BeautifulSoup

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

# Generate a list of URLs to scrape.
url_list = [
    f"https://books.toscrape.com/catalogue/category/books_1/page-{page_num}.html"
    for page_num in range(1, 51)
]


async def parse_data(text, results_list):
    soup = BeautifulSoup(text, "lxml")
    for product_data in soup.select("ol.row > li > article.product_pod"):
        data = {
            "title": product_data.select_one("h3 > a")["title"],
            "url": product_data.select_one("h3 > a").get("href")[5:],
            "product_price": product_data.select_one("p.price_color").text,
            "stars": product_data.select_one("p")["class"][1],
        }
        results_list.append(data)  # Fill results_list by reference.
        print(f"Extracted data for a book: {data['title']}")


async def fetch(session, sem, url, results_list):
    async with sem:
        async with session.get(
            url,
            proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as response:
            await parse_data(await response.text(), results_list)


async def create_jobs(results_list):
    sem = asyncio.Semaphore(4)
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(
            *[fetch(session, sem, url, results_list) for url in url_list]
        )


if __name__ == "__main__":
    results = []
    start = time.perf_counter()

    # Different EventLoopPolicy must be loaded if you're using Windows OS.
    # This helps to avoid "Event Loop is closed" error.
    if sys.platform.startswith("win") and sys.version_info.minor >= 8:
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    try:
        asyncio.run(create_jobs(results))
    except Exception as e:
        print(e)
        print("We broke, but there might still be some results")

    print(
        f"\nTotal of {len(results)} products from {len(url_list)} pages "
        f"gathered in {time.perf_counter() - start:.2f} seconds.",
    )
    df = pd.DataFrame(results)
    df["url"] = df["url"].map(
        lambda x: "".join(["https://books.toscrape.com/catalogue", x])
    )
    filename = "scraped-books.csv"
    df.to_csv(filename, encoding="utf-8-sig", index=False)
    print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}")

If you want to test the project's script by yourself, you'll need to install some additional packages. To do that, simply download requirements.txt file and use pip command:

pip install -r requirements.txt

If you're having any trouble integrating proxies with aiohttp and this guide didn't help you - feel free to contact Oxylabs customer support at [email protected].

Owner
Oxylabs.io
Oxylabs.io
Wifi-jammer - Continuously perform deauthentication attacks on all detectable stations

wifi-jammer Continuously perform deauthentication attacks on all detectable stat

Leonardo de Araujo 14 Nov 03, 2022
A pretty quick and simple interface to paramiko SFTP

A pretty quick and simple interface to paramiko SFTP. Provides multi-threaded routines with progress notifications for reliable, asynchronous transfers. This is a Python3 optimized fork of pysftp wit

14 Dec 21, 2022
A Network tool kit for scanning active IP addresses and open ports

Network scanner A small project that I wrote on the fly for (IT351) Computer Networks University Course to identify and label the devices in my networ

Mohamed Abdelrahman 10 Nov 07, 2022
A Python framework for interacting with Solana's Pyth network.

Pyth Network A basic Python framework for reading and decoding data regarding the Pyth network

1 Nov 29, 2021
Timeouts for popular Python packages

Python Timeouts An unresponsive service can be worse than a down one. It can tie up your entire system if not handled properly. All network requests s

Andrew Kane 11 Nov 22, 2022
Jogo da forca simples com conexão entre cliente e servidor utilizando TCP.

JogoDaForcaTCP Um jogo da forca simples com conexão entre cliente e servidor utilizando o protocólo TCP. Como jogar: Habilite a porta 20000, inicie o

Kelvin Santos 1 Dec 01, 2021
Nexum is an open-source, remote administration tool written in Python 3

A full-featured remote administration tool written in Python 3. The goal of this project is to make the use of a remote administration tool as simple

z3phyrus 2 Nov 26, 2021
Repo used to maintain all notes and scripts developed during my DevNet Expert studies

DevNet Expert Studies Exam Date: TBD (Waiting for registration to open) This repository will be used to track my progress and maintain all notes/scrip

Dan 32 Dec 11, 2022
Simplest dashboard for WireGuard VPN written in Python w/ Flask

Hi! I'm planning the next major update for this project, please let me know if you have any suggestions or feature requests ;) You can create an issue

Donald Zou 763 Jan 02, 2023
A powerful framework for decentralized federated learning with user-defined communication topology

Scatterbrained Decentralized Federated Learning Scatterbrained makes it easy to build federated learning systems. In addition to traditional federated

Johns Hopkins Applied Physics Laboratory 7 Sep 26, 2022
Simple self-hosted server to receive files from remote systems

Badtray This is a very simple self-hosted server to receive files from remote systems. This works similar to Bintray (RIP) and primarily designed to d

Alex Taradov 1 Nov 22, 2021
IP Rover - An Excellent OSINT tool to get information of any ip address

IP Rover - An Excellent OSINT tool to get information of any ip address. All details are explained in below screenshot

Saad 20 Dec 16, 2022
A simple port scanner for Web/ip scanning Port 0/500 editable inside the .py file

Simple-Port-Scanner a simple port scanner for Web/ip scanning Port 0/500 editable inside the .py file Open Cmd/Terminal Cmd Downloads Run Command: pip

YABOI 1 Nov 22, 2021
A project that forwards data it receives in a URL POST Request to a Discord Webhook link

Mailman Mailman is a project that basically just forwards data it receives in a URL POST Request to a Discord Webhook link and act as a sort of messag

Prakhar Trivedi 2 Mar 14, 2022
ProxyBroker is an open source tool that asynchronously finds public proxies from multiple sources and concurrently checks them

ProxyBroker is an open source tool that asynchronously finds public proxies from multiple sources and concurrently checks them. Features F

Denis 3.2k Jan 04, 2023
A collection of domains, wildcards and substrings designed for dnscrypt-proxy filter method.

A collection of domains, wildcards and substrings designed for dnscrypt-proxy filter method.

3 Oct 25, 2022
Share clipboards between two devices in a network

Shared Clipboard I felt the need for sharing clipboard texts between virtual machines but I didn't find any reliable solutions for this (I use HyperV)

Teja Swaroop 9 Jun 10, 2022
This application aims to read all wifi passwords and visualizes the complexity in graph formation by taking into account several criteria and help you generate new random passwords.

This application aims to read all wifi passwords and visualizes the complexity in graph formation by taking into account several criteria and help you generate new random passwords.

Njomza Rexhepi 0 May 29, 2022
With the use of this tool, you can change your MAC address

Akshat0404/MAC_CHANGER This tool has to be used on linux kernel. Now o

1 Jan 25, 2022
Python tutorial for implementing Oxylabs' Residential Proxies with AIOHTTP

Integrating Oxylabs' Residential Proxies with AIOHTTP Requirements for the Integration For the integration to work you'll need to install aiohttp libr

Oxylabs.io 6 Sep 14, 2022