a small library for extracting rich content from urls

Last update: Dec 27, 2022

Related tags

Overview

A small library for extracting rich content from urls.

what does it do?

micawber supplies a few methods for retrieving rich metadata about a variety of links, such as links to youtube videos. micawber also provides functions for parsing blocks of text and html and replacing links to videos with rich embedded content.

examples

here is a quick example:

import micawber

# load up rules for some default providers, such as youtube and flickr
providers = micawber.bootstrap_basic()

providers.request('http://www.youtube.com/watch?v=54XHDUOHuzU')

# returns the following dictionary:
{
    'author_name': 'pascalbrax',
    'author_url': u'http://www.youtube.com/user/pascalbrax'
    'height': 344,
    'html': u'<iframe width="459" height="344" src="http://www.youtube.com/embed/54XHDUOHuzU?fs=1&feature=oembed" frameborder="0" allowfullscreen></iframe>',
    'provider_name': 'YouTube',
    'provider_url': 'http://www.youtube.com/',
    'title': 'Future Crew - Second Reality demo - HD',
    'type': u'video',
    'thumbnail_height': 360,
    'thumbnail_url': u'http://i2.ytimg.com/vi/54XHDUOHuzU/hqdefault.jpg',
    'thumbnail_width': 480,
    'url': 'http://www.youtube.com/watch?v=54XHDUOHuzU',
    'width': 459,
    'version': '1.0',
}

providers.parse_text('this is a test:\nhttp://www.youtube.com/watch?v=54XHDUOHuzU')

# returns the following string:
this is a test:
<iframe width="459" height="344" src="http://www.youtube.com/embed/54XHDUOHuzU?fs=1&feature=oembed" frameborder="0" allowfullscreen></iframe>

providers.parse_html('<p>http://www.youtube.com/watch?v=54XHDUOHuzU</p>')

# returns the following html:
<p><iframe width="459" height="344" src="http://www.youtube.com/embed/54XHDUOHuzU?fs=1&amp;feature=oembed" frameborder="0" allowfullscreen="allowfullscreen"></iframe></p>

a small library for extracting rich content from urls

Related tags

Overview

what does it do?

examples

Owner

Charles Leifer

OSTA web scraper, for checking the status of school buses in Ottawa

Web Scraping Framework

A high-level distributed crawling framework.

This is a script that scrapes the longitude and latitude on food.grab.com

Simple library for exploring/scraping the web or testing a website you’re developing

CreamySoup - a helper script for automated SourceMod plugin updates management.

Transistor, a Python web scraping framework for intelligent use cases.

Python script to check if there is any differences in responses of an application when the request comes from a search engine's crawler.

A low-code tool that generates python crawler code based on curl or url

Grab the changelog from releases on Github

京东茅台抢购

This code will be able to scrape movies from a movie website and also provide download links to newly uploaded movies.

👁️ Tool for Data Extraction and Web Requests.

Binance Smart Chain Contract Scraper + Contract Evaluator

Scrap-mtg-top-8 - A top 8 mtg scraper using python

Pyrics is a tool to scrape lyrics, get rhymes, generate relevant lyrics with rhymes.

Crawl BookCorpus

A simple proxy scraper that utilizes the requests module in python.

Use Flask API to wrap Facebook data. Grab the wapper of Facebook public pages without an API key.

A modern CSS selector implementation for BeautifulSoup