Python wrapper for Xeno-canto API 2.0. Enables downloading bird data with one command line

Overview

birdData

BirdData is a python wrapper for Xeno-canto API 2.0. Enables user to download bird data with one command line. BirdData supports multithreading download.

Environment

Download repo to local:

git clone [email protected]:realzza/birdData.git

Set up environment:

pip install -r requirement.txt

Usage

Single-thread

Download audio data for one bird species. Use scientific name starting with lowercase. e.g, cettia cetti.

python download.py --name "cettia cetti"

Download audio data for a file of species names. Format requirement: names divided by "\n"

python download.py --name name_file

General Usage:

usage: download.py [-h] --name NAME

download bird audios

optional arguments:
  -h, --help   show this help message and exit
  --name NAME  [1] name of one bird species; [2] file of bird species spaced
               by '\n'

Multi-thread

Speed up downloading using multiple threads.

python download-mult.py --name "cettia cetti" --process-ratio 0.6

Download multiple birds in a file, format requirement: names divided by "\n"

python download-mult.py --name name_file --process-ratio 0.6

General Usage:

usage: download-mult.py [-h] --name NAME [--process-ratio PROCESS_RATIO]

download bird audios

optional arguments:
  -h, --help            show this help message and exit
  --name NAME           [1] name of one bird species; [2] file of bird species
                        spaced by '\n'
  --process-ratio PROCESS_RATIO
                        float[0~1], define cpu utilities in downloading audios
                        [default: 0.8]

To-do

  • [12.29] multiprocess download
  • define sample rate prior to download

Contact

Feel free to file an issue had you encountered any problems. Have fun!

You might also like...
PRAW, an acronym for "Python Reddit API Wrapper", is a python package that allows for simple access to Reddit's API.

PRAW: The Python Reddit API Wrapper PRAW, an acronym for "Python Reddit API Wrapper", is a Python package that allows for simple access to Reddit's AP

A Python Instagram Scraper for Downloading Profile's Posts, stories, ProfilePic and See the Details of Particular Instagram Profile.
A Python Instagram Scraper for Downloading Profile's Posts, stories, ProfilePic and See the Details of Particular Instagram Profile.

✔ ✔ InstAstra ⚡ ⚡ ⁜ Description ~ A Python Instagram Scraper for Downloading Profile's Posts, stories, ProfilePic and See the Details of Particular In

Python API wrapper around Trello's API

A wrapper around the Trello API written in Python. Each Trello object is represented by a corresponding Python object. The attributes of these objects

Async ready API wrapper for Revolt API written in Python.

Mutiny Async ready API wrapper for Revolt API written in Python. Installation Python 3.9 or higher is required To install the library, you can just ru

A Python API wrapper for the Twitter API!

PyTweet PyTweet is an api wrapper made for twitter using twitter's api version 2! Installation Windows py3 -m pip install PyTweet Linux python -m pip

Python API wrapper library for Convex Value API

convex-value-python Python API wrapper library for Convex Value API. Further Links: Convex Value homepage @ConvexValue on Twitter JB on Twitter Authen

This an API wrapper library for the OpenSea API written in Python 3.

OpenSea NFT API Python 3 wrapper This an API wrapper library for the OpenSea API written in Python 3. The library provides a simplified interface to f

YARSAW is an Async Python API Wrapper for the Random Stuff API.

Yet Another Random Stuff API Wrapper - YARSAW YARSAW is an Async Python API Wrapper for the Random Stuff API. This module makes it simpler for you to

EpikCord.py - This is an API Wrapper for Discord's API for Python

EpikCord.py - This is an API Wrapper for Discord's API for Python! We've decided not to fork discord.py and start completely from scratch for a new, better structuring system!

Comments
  • ZeroDivisionError

    ZeroDivisionError

    您好,我在尝试批量下载的时候,发现当查询结果为0个的时候,会出现 ZeroDivisionError: integer division or modulo by zero 的错误,查询trackback,错误源头在utils.py文件的第4行portion = len(lst) //n的位置。

    除此之外,我还发现当查询以下物种名称的时候也会出现同样的error。我的查询时间范围是从1971-01-01:

    ['Maleo', 'Moluccan Megapode', 'Nicobar Megapode', 'Long-billed Partridge', 'Black Partridge', 'Udzungwa Forest Partridge', 'Rubeho Forest Partridge', "Roll's Partridge", 'Sumatran Partridge', 'Grey-breasted Partridge', 'Red-billed Partridge', 'Chestnut-necklaced Partridge', 'Crestless Fireback', 'Crested Fireback', 'Siamese Fireback', 'Great Argus', 'Madagascan Pochard']
    

    以下这个list是对应以上的common_name的查询结果数量

    [7, 6, 2, 18, 3, 6, 34, 10, 1, 31, 8, 35, 1, 17, 24, 83, 5]
    

    期待回复,感谢!

    small fix 
    opened by ZhongJunhong 1
  • Something about Download Timeout

    Something about Download Timeout

    您好,我在下载的时候发现一些音频文件在某些情况下(可能是网络的问题)下载时间过长,而且进度条一直停滞不前,有时甚至停滞十分钟到二十分钟不等,有时候我只能手动中断使之进入下一个循环。

    后来我在stackoverflow上找到一个非常非常简单的解决方案,就是在执行请求的代码前加入socket.setdefaulttimeout(30),控制socket打开的时间,比如此处我设置为30秒。

    socket.setdefaulttimeout(30) 
    q.retrieve_recordings(multiprocess=True, nproc=10, attempts=10, outdir="/mnt/database/xcdata/")
    

    同时我加入一些其他设置,让代码在触发socket.timeout错误后,可以将对应的请求参数记录下来,并继续执行下一个循环。由此可以避免下载停滞的问题。当我完成所有循环以后,会将socket打开的时间再增加(比如增加到120秒),对先前记录下的请求参数再次执行。由此循环多次以将所有参数请求完毕。

    https://stackoverflow.com/questions/32763720/timeout-a-file-download-with-python-urllib

    希望能帮助到大家。

    Hello, when I was downloading, I found that the download time of some audio files was too long under certain circumstances (may be a problem with the network), and the progress bar has been stagnant, sometimes even for ten to twenty minutes, there are Sometimes I can only manually interrupt it to enter the next cycle.

    Later, I found a very, very simple solution on stackoverflow, which is to add socket.setdefaulttimeout(30) before executing the requested code to control the opening time of the socket. For example, I set it to 30 seconds here.

    socket.setdefaulttimeout(30) 
    q.retrieve_recordings(multiprocess=True, nproc=10, attempts=10, outdir="/mnt/database/xcdata/")
    

    At the same time, I added some other settings so that after the code triggers the socket.timeout error, it can record the corresponding request parameters and continue to execute the next loop. This can avoid the problem of download stagnation. When I finish all the loops, I will increase the socket opening time (for example, to 120 seconds), and execute again on the previously recorded request parameters. Loop for multiple times to complete the request for all parameters.

    https://stackoverflow.com/questions/32763720/timeout-a-file-download-with-python-urllib

    I Hope that can help everyone.

    opened by ZhongJunhong 3
Releases(v0.0.4)
  • v0.0.4(May 13, 2022)

    • Support Query by bird name.
    • Automatically cut inessential processes in query traffic.
    • Optimized query assignment strategy in recording retrieval.
    Source code(tar.gz)
    Source code(zip)
Owner
_zza
Computational (Para)linguistics | Multimodal Analysis | Founder of @Presento | Previously @ByteDance AI-lab Shanghai
_zza
A Telegram bot to extracting text from images. All languages supported.

OCR Bot A Telegram bot to extracting text from images. All languages supported. Deploy to Heroku Local Deploying Clone the repo git clone https://gith

6 Oct 21, 2022
An advanced Filter Bot with nearly unlimitted filters!

Unlimited Filter Bot ㅤㅤㅤㅤㅤㅤㅤ ㅤㅤㅤㅤㅤㅤㅤ An advanced Filter Bot with nearly unlimitted filters! Features Nearly unlimited filters Supports all type of fil

1 Nov 20, 2021
Scrape the Twitter Frontend API without authentication.

Twitter Scraper 🇰🇷 Read Korean Version Twitter's API is annoying to work with, and has lots of limitations — luckily their frontend (JavaScript) has

Buğra İşgüzar 3.4k Jan 08, 2023
Pysauce is a Discord bot which utilizes the SauceNAO API to locate the source of images.

Pysauce Pysauce is a Discord bot which utilizes the SauceNAO API to locate the source of images. Use Pysauce has one public instance always running, i

Akira 2 Oct 04, 2022
Marketplace for self published books

Nile API API for the imaginary Nile marketplace for self published books. This is a project created to try out FastAPI as the post promising ASGI serv

Matt de Young 1 Jan 31, 2022
Python-random-quote - A file-based quote bot written in Python

Let's Write a Python Quote Bot! This repository will get you started with building a quote bot in Python. It's meant to be used along with the Learnin

amir mohammad fateh 1 Jan 02, 2022
AI-El-Yazisini-Tanima - Fotoğraflardaki El Yazını Yapay Zeka İle Otomatik Tanıma Yazılımı

AI-El Yazısını Tanıma Fotoğraflardaki El Yazını Yapay Zeka İle Otomatik Tanıma Yazılımı Amaç : Birden fazla makine öğrenmesi modelini bir arada kullan

Özgür Tokay 3 Mar 02, 2022
The open source version of Tentro - A multipurpose Discord bot.

Welcome to Tentro 👋 A multipurpose Discord bot. 🏠 Homepage Install pip install -r requirements.txt Usage py Tentro.py Contributors 👤 Tentro Dev Tea

6 Jul 14, 2022
A simple API Wrapper for Guilded.

Guildr A simple API Wrapper for Guilded. Frequently updated! I am not a user of Guilded, meaning I do not keep track of new Guilded updates or patches

2 Mar 07, 2022
Create Fast and easy image datasets using reddit

Reddit-Image-Scraper Reddit Reddit is an American Social news aggregation, web content rating, and discussion website. Reddit has been devided by topi

Wasin Silakong 4 Apr 27, 2022
Async boto3 with Autogenerated Data Classes

awspydk Async boto3 with Autogenerated JIT Data Classes Motivation This library is forked from an internal project that works with a lot of backend AW

1 Dec 05, 2021
A bot to get Statistics like the Playercount from your Minecraft-Server on your Discord-Server

Hey Thanks for reading me. Warning: My English is not the best I have programmed this bot to show me statistics about the player numbers and ping of m

spaffel 12 Sep 24, 2022
Набор утилит для Discord с использованием языка программирования Python.

Discord Tools v0.1 Functions: WebHook spamer Spotify account generator (What?) QR Code Token stealer Token generator Discord nitro gen/check Discor to

Максим Скризов 3 Aug 23, 2022
Auto filter bot for python

Media Search bot Index channel or group files for inline search. When you post file on telegram channel or group this bot will save that file in datab

1 Dec 22, 2021
ELiza music is a telegram music bot project, allow you to play music on voice chat group telegram.

❤️ 𝗘𝗹𝗶𝘇𝗮 𝗠𝘂𝘀𝗶𝗰 ❤️ Unmaintained. The new repo of @MrsElizaRobot is private. (It is no longer based on this source code. The completely rewrit

Team Eliza 2 Dec 08, 2022
New developed moderation discord bot by archisha

Monitor42 New developed moderation discord bot by αrchιshα#5518. Details Prefix: 42! Commands: Moderation Use 42!help to get command list. Invite http

Kamilla Youver 0 Jun 29, 2022
Project for the discipline of Visual Data Analysis at EMAp FGV.

Analysis of the dissemination of fake news about COVID-19 on Twitter This project was the final work for the discipline of Visual Data Analysis of the

Giovani Valdrighi 2 Jan 17, 2022
A modular Telegram Python bot running on python3 with a sqlalchemy database

Nao Tomori Robot Found Me On Telegram As Nao Tomori 🌼 A modular Telegram Python bot running on python3 with a sqlalchemy database. How to setup/deplo

Sena 84 Jan 04, 2023
批量下载抖音无水印视频

版权说明 本项目fork自Johnserf-Seed TikTokDownload。目的是为了增加个性化的功能,若想体验更多完善的功能请支持原作者的项目。 免责声明 本代码仅用于学习,下载后请勿用于商业用途。 环境要求 请检查宿主机,是否安装了python环境,并且配置了环境变量 pytho

Zhiwu Mao 44 Dec 28, 2022
A simple telegram bot to recognize lengthy voice files to text and vice versa with multiple language support.

Voicebot A simple Telegram bot to convert lengthy voice clips to text and vice versa with supporting languages. Mandatory Variables API_HASH - Yo

Renjith Mangal 12 Oct 21, 2022