Automatically download and crop key information from the arxiv daily paper.

Last update: Jul 30, 2022

Related tags

Web Crawling paper deeplearning arxiv

Overview

Arxiv daily 速览

功能：按关键词筛选arxiv每日最新paper，自动获取摘要，自动截取文中表格和图片。

1 测试环境

Ubuntu 16+
Python3.7
torch 1.9
Colab GPU

2 使用演示

首先下载权重baiduyun 提取码:il87，放置于code/ParseServer/models/PubLayNet/faster_rcnn_R_50_FPN_3x/model_final.pth

2.1 环境安装

可选择在本地使用或Colab使用，以本地使用为例。

1.提前安装Pytorch GPU版本
2.在本项目根目录启动jupyter notebook，运行Overview_RUNME_Local.ipynb
3.首次运行，先安装环境

4.运行文档版面分析服务，确认正常启动后再运行下一步

5.按照需要填写关键词进行筛选，如果需要PDF文件needPDF=True，需要将结果打包needZip=True

6.启动后，将同时进行下载和文档版面分析，截取需要的内容。下载的文件将保存在./arxiv 目录下，如果needZip=True，会产生 ./arxiv.zip 文件。

2.2 Colab

将code目录压缩上传 google drive根目录
使用Colab运行Overview_RUNME_Colab.ipynb，后续步骤同2.1

3 效果展示

本地解压后，使用Typora markdown阅览工具可进行查看。

每个文件夹中的abs.md文件保留的是当前pdf的介绍。

ps:排版不规范会导致截图混乱，这也侧面说明了文章质量。

其他

ps:本着能用就行"堆屎山"代码，有bug描述清楚提issue，定期维护。

Owner

HeoLis

Interesting in generate methods.

HeoLis

GitHub Repository

Crawl BookCorpus

These are scripts to reproduce BookCorpus by yourself.

590 Jan 03, 2023

Python script for crawling ResearchGate.net papers✨⭐️📎

ResearchGate Crawler Python script for crawling ResearchGate.net papers About the script This code start crawling process by urls in start.txt and giv

4 Aug 30, 2022

A web crawler for recording posts in "sina weibo"

Web Crawler for "sina weibo" A web crawler for recording posts in "sina weibo" Introduction This script helps collect attributes of posts in "sina wei

4 Aug 20, 2022

Extract gene TSS site form gencode/ensembl/gencode database GTF file and export bed format file.

GetTss python Package extract gene TSS site form gencode/ensembl/gencode database GTF file and export bed format file. Install $ pip install GetTss Us

6 Nov 21, 2022

爬取各大SRC当日公告 | 通过微信通知的小工具 | 赏金工具

OnTimeHacker V1.0 OnTimeHacker 是一个爬取各大SRC当日公告，并通过微信通知的小工具 OnTimeHacker目前版本为1.0，已支持24家SRC，列表如下 360、爱奇艺、阿里、百度、哔哩哔哩、贝壳、Boss、58、菜鸟、滴滴、斗鱼、饿了么、瓜子、合合、享道、京东、

95 Jan 07, 2023

Audio media crawler for lbry.

Audio media crawler for lbry. Requirements Python 3.8 Poetry 1.1.7 Elasticsearch 7.14.0 Lbry-sdk 0.99.0 Development This project uses poetry as a depe

4 Dec 03, 2022

TarkovScrappy - A nifty little bot that lets you know if a queried item might be required for a quest at some point in the land of Tarkov!

TarkovScrappy A nifty little bot that lets you know if a queried item might be required for a quest at some point in the land of Tarkov! Hideout items

2 Apr 11, 2022

This is a python api to scrape search results from a url.

googlescrape Installation Installation is simple! # Stable version pip install googlescrape Examples from googlescrape import client scrapeClient=cli

1 Dec 15, 2022

Basic-html-scraper - A complete how to of web scraping with Python for beginners

basic-html-scraper Code from YT Video This video includes a complete how to of w

12 Oct 22, 2022

Docker containerized Python Flask API that uses selenium to scrape and interact with websites

Docker containerized Python Flask API that uses selenium to scrape and interact with websites

0 Jan 22, 2022

Collection of code files to scrap different kinds of websites.

STW-Collection Scrap The Web Collection; blog posts. This repo contains Scrapy sample code to scrap the following kind of websites: Do you want to lea

15 Jun 08, 2022

Telegram Group Scrapper

this programe is make your work so much easy on telegrame. do you want to send messages on everyone to your group or others group. use this script it will do your work automatically with one click. a

3 Dec 03, 2022

Web-Scrapper using Python and Flask

Web-Scrapper "[초급]Python으로 웹 스크래퍼 만들기" 코스 -NomadCoders 기초적인 Python 문법강의부터 시작하여 웹사이트의 html파일에서 원하는 내용을 Scrapping해서 출력, csv 파일로 저장, flask를 이용한 간단한 웹페이지

1 Nov 10, 2021

SmartScraper: 简单、自动、快捷的Python网络爬虫

SmartScraper: 简单、自动、快捷的Python网络爬虫 Note: The origin developer of SmartScraper is Alireza Mika， I only change a little code of AutoScraper. SmartScraper

9 Apr 16, 2022

Scrapy-soccer-games - Scraping information about soccer games from a few websites

scrapy-soccer-games Esse projeto tem por finalidade pegar informação de tabela d

2 Jul 20, 2022

A Python library for automating interaction with websites.

Home page https://mechanicalsoup.readthedocs.io/ Overview A Python library for automating interaction with websites. MechanicalSoup automatically stor

4.3k Jan 07, 2023

Script used to download data for stocks.

This script is useful for downloading stock market data for a wide range of companies specified by their respective tickers. The script reads in the d

71 Oct 04, 2022

Danbooru scraper with python

Danbooru Version: 0.0.1 License under: MIT License Dependencies Python: = 3.9.7 beautifulsoup4 cloudscraper Example of use Danbooru from danbooru imp

2 Oct 27, 2022

Google Developer Profile Badge Scraper

Google Developer Profile Badge Scraper It is a Google Developer Profile Web Scraper which scrapes for specific badges in a user's Google Developer Pro

2 Feb 22, 2022

Haphazard scripts for scraping bitcoin/bitcoin data from GitHub

This is a quick-and-dirty tool used to scrape bitcoin/bitcoin pull request and commentary data. Each output/pr number folder contains comments.json:

8 Oct 12, 2022