Python tutorial for implementing Oxylabs' Residential Proxies with AIOHTTP

Overview

Integrating Oxylabs' Residential Proxies with AIOHTTP

Requirements for the Integration

For the integration to work you'll need to install aiohttp library, use Python 3.6 version or higher and Residential Proxies.
If you don't have aiohttp library, you can install it by using pip command:

pip install aiohttp

You can get Residential Proxies here: https://oxylabs.io/products/residential-proxy-pool

Proxy Authentication

There are 2 ways to authenticate proxies with aiohttp.
The first way is to authorize and pass credentials along with the proxy URL using aiohttp.BasicAuth:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"
 
async def fetch():
    async with aiohttp.ClientSession() as session:
        proxy_auth = aiohttp.BasicAuth(USER, PASSWORD)
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy="http://pr.oxylabs.io:7777", 
                proxy_auth=proxy_auth ,
        ) as resp:
            print(await resp.text())

The second one is by passing authentication credentials in proxy URL:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

async def fetch():
    async with aiohttp.ClientSession() as session:
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as resp: 
            print(await resp.text())

In order to use your own proxies, adjust user and pass fields with your Oxylabs account credentials.

Testing Proxies

To see if the proxy is working, try visiting https://ip.oxylabs.io. If everything is working correctly, it will return an IP address of a proxy that you're currently using.

Sample Project: Extracting Data From Multiple Pages

To better understand how residential proxies can be utilized for asynchronous data extracting operations, we wrote a sample project to scrape product listing data and save the output to a CSV file. The proxy rotation allows us to send multiple requests at once risk-free – meaning that we don't need to worry about CAPTCHA or getting blocked. This makes the web scraping process extremely fast and efficient – now you can extract data from thousands of products in a matter of seconds!

li > article.product_pod"): data = { "title": product_data.select_one("h3 > a")["title"], "url": product_data.select_one("h3 > a").get("href")[5:], "product_price": product_data.select_one("p.price_color").text, "stars": product_data.select_one("p")["class"][1], } results_list.append(data) # Fill results_list by reference. print(f"Extracted data for a book: {data['title']}") async def fetch(session, sem, url, results_list): async with sem: async with session.get( url, proxy=f"http://{USER}:{PASSWORD}@{END_POINT}", ) as response: await parse_data(await response.text(), results_list) async def create_jobs(results_list): sem = asyncio.Semaphore(4) async with aiohttp.ClientSession() as session: await asyncio.gather( *[fetch(session, sem, url, results_list) for url in url_list] ) if __name__ == "__main__": results = [] start = time.perf_counter() # Different EventLoopPolicy must be loaded if you're using Windows OS. # This helps to avoid "Event Loop is closed" error. if sys.platform.startswith("win") and sys.version_info.minor >= 8: asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) try: asyncio.run(create_jobs(results)) except Exception as e: print(e) print("We broke, but there might still be some results") print( f"\nTotal of {len(results)} products from {len(url_list)} pages " f"gathered in {time.perf_counter() - start:.2f} seconds.", ) df = pd.DataFrame(results) df["url"] = df["url"].map( lambda x: "".join(["https://books.toscrape.com/catalogue", x]) ) filename = "scraped-books.csv" df.to_csv(filename, encoding="utf-8-sig", index=False) print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}") ">
import asyncio
import time
import sys
import os

import aiohttp
import pandas as pd
from bs4 import BeautifulSoup

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

# Generate a list of URLs to scrape.
url_list = [
    f"https://books.toscrape.com/catalogue/category/books_1/page-{page_num}.html"
    for page_num in range(1, 51)
]


async def parse_data(text, results_list):
    soup = BeautifulSoup(text, "lxml")
    for product_data in soup.select("ol.row > li > article.product_pod"):
        data = {
            "title": product_data.select_one("h3 > a")["title"],
            "url": product_data.select_one("h3 > a").get("href")[5:],
            "product_price": product_data.select_one("p.price_color").text,
            "stars": product_data.select_one("p")["class"][1],
        }
        results_list.append(data)  # Fill results_list by reference.
        print(f"Extracted data for a book: {data['title']}")


async def fetch(session, sem, url, results_list):
    async with sem:
        async with session.get(
            url,
            proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as response:
            await parse_data(await response.text(), results_list)


async def create_jobs(results_list):
    sem = asyncio.Semaphore(4)
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(
            *[fetch(session, sem, url, results_list) for url in url_list]
        )


if __name__ == "__main__":
    results = []
    start = time.perf_counter()

    # Different EventLoopPolicy must be loaded if you're using Windows OS.
    # This helps to avoid "Event Loop is closed" error.
    if sys.platform.startswith("win") and sys.version_info.minor >= 8:
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    try:
        asyncio.run(create_jobs(results))
    except Exception as e:
        print(e)
        print("We broke, but there might still be some results")

    print(
        f"\nTotal of {len(results)} products from {len(url_list)} pages "
        f"gathered in {time.perf_counter() - start:.2f} seconds.",
    )
    df = pd.DataFrame(results)
    df["url"] = df["url"].map(
        lambda x: "".join(["https://books.toscrape.com/catalogue", x])
    )
    filename = "scraped-books.csv"
    df.to_csv(filename, encoding="utf-8-sig", index=False)
    print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}")

If you want to test the project's script by yourself, you'll need to install some additional packages. To do that, simply download requirements.txt file and use pip command:

pip install -r requirements.txt

If you're having any trouble integrating proxies with aiohttp and this guide didn't help you - feel free to contact Oxylabs customer support at [email protected].

Owner
Oxylabs.io
Oxylabs.io
Simple DNS resolver for asyncio

Simple DNS resolver for asyncio aiodns provides a simple way for doing asynchronous DNS resolutions using pycares. Example import asyncio import aiodn

Saúl Ibarra Corretgé 471 Dec 27, 2022
Quickly fetch your WiFi password and if needed, generate a QR code of your WiFi to allow phones to easily connect

wifi-password Quickly fetch your WiFi password and if needed, generate a QR code of your WiFi to allow phones to easily connect. Works on macOS and Li

Siddharth Dushantha 2.6k Jan 05, 2023
simple subdomain finder

Subdomain-finder Simple SubDomain finder using python which is easy to use just download and run it Wordlist you can use your own wordlist but here i

AsjadOwO 5 Sep 24, 2021
Decentra Network is an open source blockchain that combines speed, security and decentralization.

Decentra Network is an open source blockchain that combines speed, security and decentralization. Decentra Network has very high speeds, scalability, asymptotic security and complete decentralization

Decentra Network 74 Nov 22, 2022
Simple local RPG turn-based to play while learn something using the anki system

Simple local RPG turn-based to play while learn something using the anki system

Raphael Kieling 5 Aug 02, 2022
A Python server and client app that tracks player session times and server status

MC Outpost A Python server and client application that tracks player session times and server status About MC Outpost provides a session graph and ser

Grant Scrits 0 Jul 23, 2021
Client library for relay - a service for relaying server side messages to the client side browsers via websockets.

Client library for relay - a service for relaying server side messages to the client side browsers via websockets.

getme 1 Nov 10, 2021
jarbou3 is rat tool coded in python with C&C which can accept multiple connections from clients

jarbou3 Jarbou3 is rat tool with coded in python with C&C which can accept multi

youhacker55 108 Dec 29, 2022
A simple chat room using socket and threading for handle multiple connections.

• Socket Chat Room was a little project for socket study. It works with a server handling the incoming connections from the clients. Clients send encoded messages while waiting for others clients mes

Guilherme de Oliveira 2 Mar 03, 2022
Arp Spoofer using Python 3.

ARP Spoofer / Wifi Killer By Auax Run: Run the application with the following command: python3 spoof.py -t target_ip_address -lh host_ip_address I

Auax 6 Sep 15, 2022
CSP-style concurrency for Python

aiochan Aiochan is a library written to bring the wonderful idiom of CSP-style concurrency to python. The implementation is based on the battle-tested

Ziyang Hu 127 Dec 23, 2022
A tiny end-to-end latency testing tool implemented by UDP protocol in Python 📈 .

udp-latency A tiny end-to-end latency testing tool implemented by UDP protocol in Python 📈 . Features Compare with other existing latency testing too

Chuanyu Xue 5 Dec 02, 2022
A python socket.io client for Roboteur

Roboteur Client Example TODO Basic setup Install the requirements: $ pip install -r requirements.txt Run the application: $ python -m roboteur_client

Barry Buck 1 Oct 13, 2021
Network monitoring tool

netmeter If you are looking for a tool to monitor your network interfaces, here you are. See netmeter-exporter to export Prometheus metrics. Installat

Saeid Bostandoust 97 Dec 03, 2022
Connection package to a raspberry or any other machine using ssh, it simplifies the deployment scripts and monitoring.

Connection package to a raspberry or any other machine using ssh, it simplifies the deployment scripts and monitoring.

Dashstrom 7 Mar 29, 2022
🎥 PYnema is a simple UDP server written in python, allows you to watch downloaded videos.

🎥 PYnema is a simple UDP server written in python, allows you to watch downloaded videos.

Jan Kupczyk 1 Jan 16, 2022
A Python module that allows you to create and use simple sockets.

EasySockets A Python module that allows you to create and use simple sockets. Installation The easysockets module can be installed using pip. pip inst

Matthias Wijnsma 2 Jan 16, 2022
Evaluation of TCP BBRv1 in wireless networks

The Network Simulator, Version 3 Table of Contents: An overview Building ns-3 Running ns-3 Getting access to the ns-3 documentation Working with the d

3 Nov 01, 2021
Easy to use gRPC-web client in python

pyease-grpc Easy to use gRPC-web client in python Tutorial This package provides a requests like interface to make calls to gRPC-Web servers.

Sudipto Chandra 4 Dec 03, 2022
An opensource library to use SNMP get/bulk/set/walk in Python

SNMP-UTILS An opensource library to use SNMP get/bulk/set/walk in Python Features Work with OIDS json list [Find Here](#OIDS List) GET command SET com

Alexandre Gossard 3 Aug 03, 2022