Python tutorial for implementing Oxylabs' Residential Proxies with AIOHTTP

Overview

Integrating Oxylabs' Residential Proxies with AIOHTTP

Requirements for the Integration

For the integration to work you'll need to install aiohttp library, use Python 3.6 version or higher and Residential Proxies.
If you don't have aiohttp library, you can install it by using pip command:

pip install aiohttp

You can get Residential Proxies here: https://oxylabs.io/products/residential-proxy-pool

Proxy Authentication

There are 2 ways to authenticate proxies with aiohttp.
The first way is to authorize and pass credentials along with the proxy URL using aiohttp.BasicAuth:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"
 
async def fetch():
    async with aiohttp.ClientSession() as session:
        proxy_auth = aiohttp.BasicAuth(USER, PASSWORD)
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy="http://pr.oxylabs.io:7777", 
                proxy_auth=proxy_auth ,
        ) as resp:
            print(await resp.text())

The second one is by passing authentication credentials in proxy URL:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

async def fetch():
    async with aiohttp.ClientSession() as session:
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as resp: 
            print(await resp.text())

In order to use your own proxies, adjust user and pass fields with your Oxylabs account credentials.

Testing Proxies

To see if the proxy is working, try visiting https://ip.oxylabs.io. If everything is working correctly, it will return an IP address of a proxy that you're currently using.

Sample Project: Extracting Data From Multiple Pages

To better understand how residential proxies can be utilized for asynchronous data extracting operations, we wrote a sample project to scrape product listing data and save the output to a CSV file. The proxy rotation allows us to send multiple requests at once risk-free – meaning that we don't need to worry about CAPTCHA or getting blocked. This makes the web scraping process extremely fast and efficient – now you can extract data from thousands of products in a matter of seconds!

li > article.product_pod"): data = { "title": product_data.select_one("h3 > a")["title"], "url": product_data.select_one("h3 > a").get("href")[5:], "product_price": product_data.select_one("p.price_color").text, "stars": product_data.select_one("p")["class"][1], } results_list.append(data) # Fill results_list by reference. print(f"Extracted data for a book: {data['title']}") async def fetch(session, sem, url, results_list): async with sem: async with session.get( url, proxy=f"http://{USER}:{PASSWORD}@{END_POINT}", ) as response: await parse_data(await response.text(), results_list) async def create_jobs(results_list): sem = asyncio.Semaphore(4) async with aiohttp.ClientSession() as session: await asyncio.gather( *[fetch(session, sem, url, results_list) for url in url_list] ) if __name__ == "__main__": results = [] start = time.perf_counter() # Different EventLoopPolicy must be loaded if you're using Windows OS. # This helps to avoid "Event Loop is closed" error. if sys.platform.startswith("win") and sys.version_info.minor >= 8: asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) try: asyncio.run(create_jobs(results)) except Exception as e: print(e) print("We broke, but there might still be some results") print( f"\nTotal of {len(results)} products from {len(url_list)} pages " f"gathered in {time.perf_counter() - start:.2f} seconds.", ) df = pd.DataFrame(results) df["url"] = df["url"].map( lambda x: "".join(["https://books.toscrape.com/catalogue", x]) ) filename = "scraped-books.csv" df.to_csv(filename, encoding="utf-8-sig", index=False) print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}") ">
import asyncio
import time
import sys
import os

import aiohttp
import pandas as pd
from bs4 import BeautifulSoup

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

# Generate a list of URLs to scrape.
url_list = [
    f"https://books.toscrape.com/catalogue/category/books_1/page-{page_num}.html"
    for page_num in range(1, 51)
]


async def parse_data(text, results_list):
    soup = BeautifulSoup(text, "lxml")
    for product_data in soup.select("ol.row > li > article.product_pod"):
        data = {
            "title": product_data.select_one("h3 > a")["title"],
            "url": product_data.select_one("h3 > a").get("href")[5:],
            "product_price": product_data.select_one("p.price_color").text,
            "stars": product_data.select_one("p")["class"][1],
        }
        results_list.append(data)  # Fill results_list by reference.
        print(f"Extracted data for a book: {data['title']}")


async def fetch(session, sem, url, results_list):
    async with sem:
        async with session.get(
            url,
            proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as response:
            await parse_data(await response.text(), results_list)


async def create_jobs(results_list):
    sem = asyncio.Semaphore(4)
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(
            *[fetch(session, sem, url, results_list) for url in url_list]
        )


if __name__ == "__main__":
    results = []
    start = time.perf_counter()

    # Different EventLoopPolicy must be loaded if you're using Windows OS.
    # This helps to avoid "Event Loop is closed" error.
    if sys.platform.startswith("win") and sys.version_info.minor >= 8:
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    try:
        asyncio.run(create_jobs(results))
    except Exception as e:
        print(e)
        print("We broke, but there might still be some results")

    print(
        f"\nTotal of {len(results)} products from {len(url_list)} pages "
        f"gathered in {time.perf_counter() - start:.2f} seconds.",
    )
    df = pd.DataFrame(results)
    df["url"] = df["url"].map(
        lambda x: "".join(["https://books.toscrape.com/catalogue", x])
    )
    filename = "scraped-books.csv"
    df.to_csv(filename, encoding="utf-8-sig", index=False)
    print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}")

If you want to test the project's script by yourself, you'll need to install some additional packages. To do that, simply download requirements.txt file and use pip command:

pip install -r requirements.txt

If you're having any trouble integrating proxies with aiohttp and this guide didn't help you - feel free to contact Oxylabs customer support at [email protected].

Owner
Oxylabs.io
Oxylabs.io
Web-server with a parser, connection to DBMS, and the Hugging Face.

Final_Project Web-server with parser, connection to DBMS and the Hugging Face. Team: Aisha Bazylzhanova(SE-2004), Arysbay Dastan(SE-2004) Installation

Aisha Bazylzhanova 2 Nov 18, 2021
KoreaVPN - Create a VPN App for Mac Using Automator

VPN app 만들기 (a.k.a. KoreaVPN) VPN을 사용하기 위해 들어가는 10초의 시간을 아끼고, 귀찮음을 최소화 하기 위해 크롤링

DongHee 6 Jan 17, 2022
Python module to interface with Tuya WiFi smart devices

TinyTuya Python module to interface with Tuya WiFi smart devices Description This python module controls and monitors Tuya compatible WiFi Smart Devic

Jason Cox 365 Dec 26, 2022
Eclipse zenoh Python API

Eclipse zenoh Python API Eclipse zenoh is an extremely efficient and fault-tolerant Named Data Networking (NDN) protocol that is able to scale down to

26 Jan 05, 2023
AV Evasion, a Red Team Tool - Fiber, APC, PNG and UUID

AV Evasion, a Red Team Tool - Fiber, APC, PNG and UUID

9 Mar 07, 2022
Converts from PC formatted MAC addresses (hardware addresses) to Cisco format or vice-versa

MAC-Converter Converts from PC formatted MAC addresses (hardware addresses) to Cisco format or vice-versa Stores the results to a file in the same dir

Stew Alexander 0 Dec 24, 2022
RabbitMQ asynchronous connector library for Python with built in RPC support

About RabbitMQ connector library for Python that is fully integrated with the aio-pika framework. Introduction BunnyStorm is here to simplify working

22 Sep 11, 2022
Converts Cisco formatted MAC Addresses to PC formatted MAC Addresses

Cisco-MAC-to-PC-MAC Converts a file with a list of Cisco formatted MAC Addresses to PC formatted MAC Addresses... Ex: abcd.efgh.ijkl to AB:CD:EF:GH:I

Stew Alexander 0 Jan 04, 2022
syncio: asyncio, without await

syncio: asyncio, without await asyncio can look very intimidating to newcomers, because of the async/await syntax. Even experienced programmers can ge

David Brochart 10 Nov 21, 2022
🐛 Self spreading Botnet based on Mirai C&C Arch, spreading through SSH and Telnet protocol.

HBot Self spreading Botnet based on Mirai C&C Arch, spreading through SSH and Telnet protocol. Modern script fullly written in python3. Warning. This

Ѵιcнч 137 Nov 14, 2022
Aiotor - a pool of proxies, shifting on each request

Aiotor - a pool of proxies, shifting on each request

Leon 32 Dec 26, 2022
Python implementation of the Session open group server

API Documentation CLI Reference Want to build from source? See BUILDING.md. Want to deploy using Docker? See DOCKER.md. Installation Instructions Vide

Oxen 36 Jan 02, 2023
Client library for relay - a service for relaying server side messages to the client side browsers via websockets.

Client library for relay - a service for relaying server side messages to the client side browsers via websockets.

getme 1 Nov 10, 2021
PoC code for stealing the WiFi password of a network with a Lovebox IOT device connected

LoveBoxer PoC code for stealing the WiFi password of a network with a Lovebox IOT device connected. This PoC was is what I used in this blogpost Usage

Graham Helton 10 May 24, 2022
Makes dynamically updating your Cloudflare DNS records a bit easier ⏩👍😎

Easy Dynamic Cloudflare DNS Updater Makes dynamically updating your Cloudflare DNS records a bit easier ⏩ 👍 😎 If using it as a 'Dynamic DNS' client,

Zac Koch 3 Dec 19, 2021
Arp-spoofing, this script was written for people who want to spoof any vulnerable machine such as Wİndows, of course it could have been more sophisticatedly created but these repos will be updated constantly

ARP-SPOOF ARP spoofing is a type of attack in which a malicious actor sends falsified ARP (Address Resolution Protocol) messages over a local area net

2 Dec 28, 2021
Easily share folders between VMs.

This package aims to solve the problem of inter-VM file sharing (rather than manual copying) by allowing a VM to mount folders from any other VM's file system (or mounted network shares).

Rudd-O 12 Oct 17, 2022
Truetool - A TrueCharts automatic and bulk update utility

truetool A easy tool for frequently used TrueNAS SCALE CLI utilities. Previously

TrueCharts 125 Jan 04, 2023
A powerful framework for decentralized federated learning with user-defined communication topology

Scatterbrained Decentralized Federated Learning Scatterbrained makes it easy to build federated learning systems. In addition to traditional federated

Johns Hopkins Applied Physics Laboratory 7 Sep 26, 2022
A Python module that allows you to create and use simple sockets.

EasySockets A Python module that allows you to create and use simple sockets. Installation The easysockets module can be installed using pip. pip inst

Matthias Wijnsma 2 Jan 16, 2022