Python code to crawl computer vision papers from top CV conferences. Currently it supports CVPR, ICCV, ECCV, NeurIPS, ICML, ICLR, SIGGRAPH

Overview

Crawling-CV-Conference-Papers

News

  • 2021-6-21 Support CVPR-2021

Download all CVPR-2021 papers in one click. Just set the local download directory in download_cvpr2021.py and run it! Don't forget to have your chrome driver ready (i.e., corresponding version to your Chrome browser)

  • 2021-6-20 Support continuation of downloading from where the program encounters interruption. (prevent re-downloading from scratch)

Introduction

Python code to crawl computer vision papers from top CV conferences. Currently it supports CVPR, ICCV, ECCV, NeurIPS, ICML, ICLR, SIGGRAPH. It leverages selenium, a website testing framework to crawl the titles and pdf urls from the conference website, and download them one by one with some simple anti-anti-crawler tricks.

Websites for older conferences are not guaranteed to be bug-free, since this project is based on newest website structure.

Recommend to work with Mendeley. You will get a juicy academic corpus.

Currently only single-thread downloading is implemented. Therefore the downloading for thousands of papers would be slow (takes several hours). It is suggested that you run the script before bed and it would be finished when you get to work again :)

Multi-thread downloading will be coming soon!

Requirements

pip install selenium, slugify

Besides, downlowd chromedriver.exe from the link to any local path you favour.

Usage

To execute the crawler, you could run download.py or download.ipynb (Basically the same). Before the execution, some paths need to be set up, including:

conference = 'neurips'
conference_url = "https://papers.nips.cc/paper/2019" # the conference url to download papers from
chromedriver_path = '.../chromedriver.exe' # the chromedriver.exe path
root = './NeurIPS-2019-ALL' # file path to save the downloaded papers

Here are some conference url examples:

cvpr: https://openaccess.thecvf.com/CVPR2020 (CVPR 2020)
eccv: https://openaccess.thecvf.com/ECCV2018 (ECCV 2018) (changed in 2020)
eccv: https://www.ecva.net/papers.php (ECCV 2020) 
iccv: https://openaccess.thecvf.com/ICCV2019 (ICCV 2019)
icml: http://proceedings.mlr.press/v119/ (ICML 2020)
neurips: https://papers.nips.cc/paper/2020 (NeurIPS 2020)
iclr: https://openreview.net/group?id=ICLR.cc/2021/Conference (ICLR 2021)
siggraph: https://dl.acm.org/toc/tog/2020/39/4 (SIGGRAPH 2020)

Replace the url and the conference names with your choice.

If you want to crawl papers from other conference website, all you need to do is to write a retrieve function like the ones in retrieve_titles_urls_from_websites.py, to parse html code and retrieve the paper titles and pdf urls into two lists.

Others

Warnings: It is heard that crawling from conference websites might cause a banning of your IP (hasn't happened to me so far). Not sure of the risk.

Warnings: This project is for learning purpose only. Do not crawl the same website frequently, which will burden the server.

Welcome to submit a pull request if there is any bugs or if you would like to add support to other conferences!

Maintainer

Xiaoyang Huang

Email: [email protected]

Owner
Xiaoyang Huang
Xiaoyang Huang
Fetch papers and metadata.

Fetch PubMed Central for open-access papers as well as Sci-Hub

4 Oct 31, 2022
Tool To download Amazon 4k SDR HDR 1080, CDM IS Not Included

WV-AMZN-4K-RIPPER Tool To download Amazon 4k SDR HDR 1080, CDM IS Not Included For CDM You can Mail :- 11 Dec 23, 2021

Noto fonts go universal! Download Noto fonts combined to suit your region (South Asia, SE Asia, Africa-MiddleEast, Europe-Americas).

Go Noto Universal Noto fonts go universal! Download Noto fonts combined to suit your region (South Asia, SE Asia, East Asia, Africa-MiddleEast, Europe

Satish B 67 Jan 06, 2023
Most versatile Telegram torrent and youtube-dl bot.

TorToolkit Telegram So basically Tortoolkit is aimed to be the most versatile torrent leecher and Youtube-DL bot for telegram. This bot is highly cust

Yash Khadse 541 Dec 22, 2022
Gogoanime-dl - Gogoanime downloader for downloading anime.

gogoanime-dl With this script, you can download episodes of your favorite anime from Gogoanime. The current site that's developed against is https://w

1 Jan 06, 2022
Download all games from a public Itch.io Game Jam

Itch Jam Downloader Downloads all games from a public Itch.io Game Jam. What you'll need: Python 3.8+ pip install -r requirements.txt For site mirrori

Dragoon Aethis 19 Dec 07, 2022
Download all posts and comments in a subreddit

subreddit downloader This subreddit downloader downloads all posts and comments in a subreddit For a tutorial to use this program please follow this m

Guneet 6 Dec 16, 2022
Ripurei is a free-to-use osu! replay downloader, that can be configured to download from any osu! server.

Ripurei Ripurei is a fully functional osu! replay downloader, fully capable of downloading from almost any osu! server. Functionality Timeline ✔️ Able

Thomas 0 Feb 11, 2022
MMDL (Mega Music Downloader) - A tool to easily download music.

mmdl - Mega Music Downloader What is mmdl ❓ TLDR: MMDL is a cli app which allows you to quickly and efficiently download one or multiple songs from Yo

techboy-coder 30 Dec 13, 2022
Implementation of Cross-category Video Highlight Detection via Set-based Learning (ICCV 2021).

Cross-category Video Highlight Detection via Set-based Learning Introduction This project is an implementation of ``Cross-category Video Highlight Det

Minghao (Alan) Xu 49 Dec 17, 2022
Yahoo! Finance next gen python 3 / pandas market data downloader

Yahoo! Finance-ng python3 / pandas market data downloader Ever since Yahoo! finance decommissioned their historical data API, many programs that relie

Pedro Larroy 7 Dec 09, 2022
Python module to download all media from a CyberDrop gallery.

CyberDrop Downloader Intro Let's suppose you found out the Eva G (bby_gee) leak on https://cyberdrop.me/a/aWAt4TWY. You wish you could download the en

Quatrecentquatre 1 Dec 12, 2021
📼Command line tool based on youtube-dl to easily download selected channels from your subscriptions.

youtube-cdl Command line tool based on youtube-dl to easily download selected channels from your subscriptions. This tool is very handy if you want to

Anatoly 64 Dec 25, 2022
A python script that discovers hidden YouTube API clients. Just a research project.

YouTube-Internal-Clients A script that discovers hidden internal clients of the YouTube (Innertube) API using bruteforce methods. The script tries cli

David 97 Jan 02, 2023
抖音去水印视频批量下载,完全使用抖音官方接口

TikTokDownload 抖音去水印视频下载,使用抖音官方接口 使用教程(Win7) Win10环境暂时没测,bug情况应该比Win7少 运行软件前先打开目录下 conf.ini 文件按照要求进行配置 批量下载可直接修改配置文件,单一视频下载请直接打开粘贴视频链接即可

JohnserfSeed 2k Jan 04, 2023
Ebook downloader built using python

ebook-downloader Getting Started Open a terminal and run the following commands. git clone github.com/georgemunyoro/ebook-downloader cd ./ebook-downlo

George Munyoro 1 Oct 19, 2021
A cli tool to download purchased products from the DLsite.

dlsite-downloader A cli tool to download purchased products from the DLsite. How can I use? This program runs with configurations defined at settings.

AcrylicShrimp 9 Dec 23, 2022
A tool to make easy to search for directories in the URL.

Welcome to Brutos Directory Scanner 🚀 The Brutos is a python script used to provide agility in obtaining verifications to informations about related

Sérgio Corrêa 4 Apr 14, 2022
A downloader for the ISIS service of TU Berlin

isis_dl A downloading utility for the ISIS tool of TU-Berlin. Version 0.4 Features Downloads all Material from all courses of your ISIS page. Efficien

1 Nov 06, 2021
Downloads separate (specified) file to a randomly generated folder in /TEMP then executes it.

PyTemp-1 A Python3 file downloader. What you do with this code / project / idea is non of my buisness or concern, and this was made for **educational*

NightTab 1 Aug 03, 2022