Automatically download and crop key information from the arxiv daily paper. (cpu version)

Related tags

DownloaderFocusAX
Overview

FocusAX

按关键词筛选arxiv每日最新paper或从arxiv搜索。

  • 自动下载、获取摘要、自动截取文中表格和图片。

安装必要的环境

  • 安装 paddle
# GPU安装
python3 -m pip install paddlepaddle-gpu==2.1.1 -i https://mirror.baidu.com/pypi/simple

# CPU安装
 python3 -m pip install paddlepaddle==2.1.1 -i https://mirror.baidu.com/pypi/simple
  • 安装 Layout-Parser
=2.2"">
pip3 install -U https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl
pip install "paddleocr>=2.2"
  • 按照其他必要的包
pip3 install -r requirements.txt
  • 下载模型权重
  • PubLayNet 下载解压后放置在paperparse目录下。目录结构如下
FocusAX
    - paperparse
        - ppyolov2_r50vd_dcn_365e_publaynet
            - inference.pdiparams
            - inference.pdiparams.info
            - inference.pdmodel
        - ...
    - downloader
        - ...
    - utils
        - ...
    - configs.py
    - focus_daily.py
    - focus_search.py
    - README.py
    - ...

使用教程

  • configs.py :程序参数配置文件
# =============== 网络代理 ================
# proxy = None # 不使用代理
proxy = {"http": "socks5://127.0.0.1:8080", "https": "socks5://127.0.0.1:8080"}
# =============== 保存文件根目录 ================
root_path = "./arxiv"
# =============== DNN模型推理配置信息 ================
threshold = 0.5
enable_mkldnn = True
enforce_cpu = True
thread_num = 4
  • focus_daily.py :按关键字过滤arxiv daily上的文章(仅当日)
if __name__ == '__main__':
    key_words = ['GAN'] # 要包含的关键词
    subject_words = ['ML', 'CV', 'AI']  # 要包含的类别
    start_parse(key_words, subject_words, needPDF=True, needZip=False)
  • focus_search.py :按关键字在arxiv检索
start_parse('Keyword')
  • root_path 目录中将创建新的文件夹保存结果

效果图

每个文件夹中的abs.md文件保留的是当前pdf的介绍,使用Typora等markdown编辑器打开。 image

image

ps:论文排版不规范会导致截图混乱。

其他

Owner
HeoLis
Interesting in generate methods.
HeoLis
Aline file downloader automator!

AlineDorker Aline is used for donwloading files with google dorking , dowloading specified links such as dorks. Dependences: python3 installed pip ins

27 Nov 16, 2022
Download all posts and comments in a subreddit

subreddit downloader This subreddit downloader downloads all posts and comments in a subreddit For a tutorial to use this program please follow this m

Guneet 6 Dec 16, 2022
Spy Ad Network - Spy Ad Network Detection With Python

Spy Ad Network Spy Ad Network Detection Jumps from link to link to access a site

Baris Dincer 2 Jan 13, 2022
Download your bandcamp collection using this python script.

bandcamp-downloader Download your Bandcamp collection using this python script. It requires you to have a browser with a logged in session of bandcamp

72 Dec 20, 2022
A youtube-dl fork with additional features and fixes

yt-dlp is a youtube-dl fork based on the now inactive youtube-dlc. The main focus of this project is adding new features and patches while also keepin

yt-dlp 37.1k Jan 03, 2023
Will load an SRC page, logged in with Firefox's cookies imported, and delete all comments from every run

SRCCommentsAutoDeleter Will load an SRC page, logged in with a support browser's cookies, and delete all comments from every run Config is all done in

3 Oct 29, 2021
Download from HBO-MAX-BLIM-TV-Paramount

#HBO MAX- BlimTV -Paramount plus 4K Downloader Tool To download 4K HDR DV SDR from HBO MAX- BlimTV -Paramount plus Hello Fellow Developers/ ! Hi! M

4 Dec 25, 2021
Download images where login is required using har python and js

이미지 다운로드(har, python, js 사용) 로그인이 필요한 사이트에서 DevTools로 이미지를 다운받는 방법은 조금 까다로웠다. 가장 쉽게 할 수 있는 방법을 찾아보았다. 사용법 F12를 눌러 DevTools를 실행 Network 탭으로 이동 페이지 새로고침

0 Jul 22, 2022
📼Command line tool based on youtube-dl to easily download selected channels from your subscriptions.

youtube-cdl Command line tool based on youtube-dl to easily download selected channels from your subscriptions. This tool is very handy if you want to

Anatoly 64 Dec 25, 2022
A bot to download songs from YouTube to telegram.

Song-Downloader-Bot A BOT TO DOWNLOAD SONGS FROM YOUTUBE. Mandatory variables API_ID - Get It From my.telegram.org API_HASH - Get It From my.telegram.

Ashik Muhammed 38 Dec 11, 2022
ComicDownloader - Downloads Comics from readcomiconline.li

ComicDownloader Downloads Comics from readcomiconline.li To use this script from

2 Nov 08, 2022
python code used to download all images contained in a facebook uid , the uid can be profile,group,fanpage

python code used to download all images contained in a facebook uid , the uid can be profile,group,fanpage

VVHai 2 Dec 21, 2021
Mobile based API for Crunchyroll BETA (and Downloader).

Mobile based API for Crunchyroll BETA (and Downloader). Not restricted on servers and NO CLOUDFLARE

27 Dec 11, 2022
A cross-platform python based utility to download courses from udemy for personal offline use.

udemy-dl A cross-platform python based utility to download courses from udemy for personal offline use. Warning Udemy has started to encrypt many of t

Nasir Khan 4.6k Dec 31, 2022
Download all games from a public Itch.io Game Jam

Itch Jam Downloader Downloads all games from a public Itch.io Game Jam. What you'll need: Python 3.8+ pip install -r requirements.txt For site mirrori

Dragoon Aethis 19 Dec 07, 2022
Discord Nitro Generator + Checker

Discord Nitro Generator + Checker Usage Download the project files and run main.py You will be prompted with 2 questions the first one being the amoun

509 Jan 02, 2023
AkShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库

Overview AkShare requires Python(64 bit) 3.7 or greater, aims to make fetch financial data as convenient as possible. Write less, get more! Documentat

Albert King 5.8k Jan 03, 2023
TikTok downloader video without watermark from Telegram bot

⬇️ How to download video from Tik Tok via telegram bot? Send a link to the video from tik tok to our telegram bot and it will send you a video without

1 Mar 04, 2022
Fully automated download and parsing for Texas A&M University's Registrar's grade distribution PDFs for years 2014+.

Fully automated download and parsing for Texas A&M University's Registrar's grade distribution PDFs for years 2014+. Adds the parsing results to a mySQL database.

TAMU Grade Distribution 1 Sep 28, 2022
抖音批量下载助手

抖音批量下载助手

HuangSK 303 Jan 05, 2023