A Python package that can be used to download post and comment data from Reddit.

Overview

Reddit Data Collector

Reddit Data Collector is a Python package that allows a user to collect post and comment data from Reddit. It is built on top of the Python module PRAW, which stands for "The Python Reddit API Wrapper". It aims to make it very simple for a user to collect data from Reddit for further analysis (e.g. Natural Language Processing), without having to learn the inner workings of PRAW or the Reddit API.

The main functionalities provided by the package currently include:

  1. Ability to collect a sample of post data and comment data from Reddit by simply providing the subreddit names that you wish to collect data from.

  2. Ability to convert that data into a pandas DataFrame in order to inspect it and save it for further use.

  3. Ability to seamlessly update an existing .csv file that contains some sample data collected with the package in the past, with some new sample data that is also collected with the package.

It is currently maintained by Nico Van den Hooff.

Installation

Dependencies

Reddit Data Collector requires Python and:

  • pandas (>=1.3.5)
  • praw (>=7.5.0)
  • tqdm (>=4.62.3)

User installation

The recommended way to install Reddit Data Collector is using pip:

pip install reddit-data-collector

How to Use Reddit Data Collector

Please see the examples directory for step by step instructions on how to use Reddit Data Collector.

Development

Important links

Source code

You can check the latest sources with the command:

git clone https://github.com/nicovandenhooff/reddit-data-collector.git

Contributing

To learn more about making a contribution to Reddit Data Collector, please see the contributing file.

Potential Ideas for Contribution

  • Add ability to collect images from Reddit posts that contain them.
  • Add author information to post and comment data, currently the Reddit API is inconsistent with suspended and deleted author data, so this functionality has not been built in yet.

Testing

After installation, you can launch the test suite, which is contained in the tests/tests.py. Note that you will have to have pytest >= 6.2.5 installed. You can launch the test suite by following these steps from the projects root directory:

  1. Open up tests.py with the following command:
open tests/tests.py

Comment out lines 24 to 30. Change the values in DataCollector() in line 32 to your Reddit credentials.

  1. Run the following command:
pytest tests/test.py

Project History

The project was started in January 2022 by Nico Van den Hooff as a side project while he was completing the UBC Master of Data Science Project. Nico wanted to obtain a sample of posts and comments from Reddit, but noticed that while PRAW existed and provided seamless access to Reddit's API, there was no package available that allowed for a simple method to collect this data.

Inspiration

Certain sections of this README file was inspired by the scikit-learn README.

You might also like...
Auto Join: A GitHub action script to automatically invite everyone to the organization who comment at the issue page.

Auto Invite To Org By Issue Comment A GitHub action script to automatically invite everyone to the organization who comment at the issue page. What is

Auto Liker, Auto Reaction, Auto Comment, Auto Follower Tool. RajeLiker Credit Hacker.
Auto Liker, Auto Reaction, Auto Comment, Auto Follower Tool. RajeLiker Credit Hacker.

Auto Liker, Auto Reaction, Auto Comment, Auto Follower Tool. RajeLiker Credit Hacker. Unlimited RajeLiker Credit Hack. Thanks To RajeLiker.

A simple Discord bot that can fetch definitions and post them in chat.
A simple Discord bot that can fetch definitions and post them in chat.

A simple Discord bot that can fetch definitions and post them in chat. If you are connected to a voice channel, the bot will also read out the definition to you.

A simple fun discord bot using discord.py that can post memes

A simple fun discord bot using discord.py * * Commands $commands - to see all commands $meme - for a random meme from the internet $cry - to make the

One version package to rule them all, One version package to find them, One version package to bring them all, and in the darkness bind them.

AwesomeVersion One version package to rule them all, One version package to find them, One version package to bring them all, and in the darkness bind

A simple script & container to pull COVID data from covidlive.com.au and post a summary to a slack channel
A simple script & container to pull COVID data from covidlive.com.au and post a summary to a slack channel

CovidLive AU Summary Slackbot This bot is a very simple slackbot that pulls data, summarises and posts up to date AU COVID stats to a provided slack c

Track live sentiment for stocks from Reddit and Twitter and identify growing stocks
Track live sentiment for stocks from Reddit and Twitter and identify growing stocks

Market Sentiment About This repository can mainly be used for two things. a. Tracking the live sentiment of stocks from Reddit and Twitter b. Tracking

A reddit.com bot that will return reference links from official python documentation site for the standard library.

Python Docs Bot A reddit.com bot that will return documentation links for the library and language reference sections of the python docs website. The

A Python bot that uses the Reddit API to send users inspiring messages.

AnonBot By Edric Antoine A Python bot that uses the Reddit API to send users inspiring messages. When a message includes 'What would Anon do?', the bo

Releases(v1.1.0)
  • v1.1.0(Mar 14, 2022)

    [1.1.0] - 2022-03-13

    Changed

    • Changed get_data in reddit_data_collector.py to return pandas DataFrame by default
    • Updated tests for the above
    Source code(tar.gz)
    Source code(zip)
  • v1.0.2(Jan 15, 2022)

    [1.0.2] - 2022-01-14

    Fixed

    • Updated _check_subreddit_exists in reddit_data_collector.py to check both names as .lower()
    • Updated tests for the above

    Changed

    • Updated README to include instructions on coverage tests
    Source code(tar.gz)
    Source code(zip)
  • v1.0.1(Jan 12, 2022)

    [1.0.1] - 2022-01-12

    Fixed

    • Spelling error of separate argument in to_pandas function of reddit_data_collector.io.py, previously it was spelt like seperate

    Changed

    • Update example use and move to /examples
    • Update PyPi link in docs to working link
    • Add new potential ideas for contribution
    Source code(tar.gz)
    Source code(zip)
  • v1.0.0(Jan 7, 2022)

Owner
Nico Van den Hooff
UBC Master of Data Science
Nico Van den Hooff
IMDbPY is a Python package useful to retrieve and manage the data of the IMDb movie database about movies, people, characters and companies

IMDbPY is a Python package for retrieving and managing the data of the IMDb movie database about movies, people and companies. Revamp notice Starting

Davide Alberani 1.1k Jan 02, 2023
discord.py bot written in Python.

bakerbot Bakerbot is a discord.py bot written in Python :) Originally made as a learning exercise, now used by friends as a somewhat useful bot and us

8 Dec 04, 2022
Eva Maria Bot With Python

Eva Maria Bot Features Auto Filter Manual Filter IMDB Admin Commands Broadcast Index IMDB search Inline Search Random pics ids and User info Stats, Us

Aadhi 3 Jan 06, 2022
Create Discord Accounts Semi-Automatically Without Captcha Solving API Key

Discord-Account-Generator Create Discord Accounts Semi-Automatically without captcha solving api key IMPORTANT: Your chromedriver version should be th

NotSakshyam 11 Mar 21, 2022
A multipurpose Telegram Bot written in Python for mirroring files on the Internet to Google Drive

Mirror Leech Bot Mirror Leech Bot is a multipurpose Telegram Bot written in Python for mirroring files on the Internet to our beloved Google Drive. Ba

1 Jan 01, 2022
Visualize size of directories, s3 buckets.

Dir Sizer This is a work in progress, right now consider this an Alpha or Proof of Concept level. dir_sizer is a utility to visualize the size of a di

Scott Seligman 13 Dec 08, 2022
Video Stream is a telegram bot project that's allow you to play video on telegram group video chat

Video Stream is a telegram bot project that's allow you to play video on telegram group video chat 🚀 Get SESSION_NAME from below: Pyrogram ## ✨ Featu

1 Nov 10, 2021
MCNameBot is a fast discord bot that is used to check the availability of a Minecraft name with a simple command.

MCNameBot MCNameBot is a fast discord bot that is used to check the availability of a Minecraft name with a simple command. If you would like to just

Killin 2 Oct 11, 2022
Cutting-edge GitHub page customization tool

Cutting-edge GitHub page customization tool Want to customize your GitHub user page, but don't know how? Now you can make your profile unique and attr

Igor Vaiman 32 Aug 24, 2022
The Discord bot framework for Python

Pycordia ⚠️ Note! As of now, this package is under early development so functionalities are bound to change drastically. We don't recommend you curren

Ángel Carias 24 Jan 01, 2023
SOLSEA-NFT-EXPLORE - Using Streamlit to build a simple UI on top of the Solana API

SOLSEA NFT Explorer Using Streamlit to build a simple UI on top of the Solana AP

Devin Capriola 3 Mar 19, 2022
PunkScape Discord bot to lookup rarities, create diptychs and more.

PunkScape Discord Bot A Discord bot created for the Discord server of PunkScapes, a banner NFT project. It was intially created to lookup rarities of

Akuti 4 Jun 24, 2022
ALIEN: idA Local varIables rEcogNizer

ALIEN: idA Local varIables rEcogNizer ALIEN is an IDA Pro plugin that allows the user to get more information about ida local variables with the help

16 Nov 26, 2022
数字货币BTC量化交易系统-实盘行情服务器,虚拟币自动炒币-火币API-币安交易所-量化交易-网格策略。趋势跟踪策略,最简源码,可在线回测,一键部署,可定制的比特币量化交易框架,3年实盘检验!

huobi_intf 提供火币网的实时行情服务器(支持火币网所有交易对的实时行情),自带API缓存,可用于实盘交易和模拟回测。 行情数据,是一切量化交易的基础,可以获取1min、60min、4hour、1day等数据。数据能进行缓存,可以在多个币种,多个时间段查询的时候,查询速度依然很快。 服务框架

dev 258 Sep 20, 2021
TG-Streaming-bot - TG Simple Streaming bot

TG Simple Streaming bot telegram video straming bot 🎚️ Features Play youtube li

HyDrix 4 May 05, 2022
Auto-Approved-Bot - Auto Approved Invaite Link Request Telegram Bot

🤖 𝗔𝘂𝘁𝗼-𝗔𝗽𝗽𝗿𝗼𝘃𝗲-𝗕𝗼𝘁 🤖 ℹ️ 𝗨𝘀𝗲𝗴𝗲 ℹ️ When a join request invita

Muhammed 32 Dec 18, 2022
A simple test repo created following docker docs.

docker_sampleRepo A simple test repo created following docker docs. Link to docs: https://docs.docker.com/language/python/develop/ Other links: https:

Suraj Verma 2 Sep 16, 2022
Want to play What Would Rather on your Server? Invite the bot now!😏

What is this Bot? 👀 What You Would Rather? is a Guessing game where you guess one thing. Long Description short Take this example: You typed r!rather

丂ㄚ么乙ツ 2 Nov 17, 2021
QR login for pyrogram client

Generate Pyrogram session via QRlogin

ポキ 18 Oct 21, 2022