A Python package that can be used to download post and comment data from Reddit.

Overview

Reddit Data Collector

Reddit Data Collector is a Python package that allows a user to collect post and comment data from Reddit. It is built on top of the Python module PRAW, which stands for "The Python Reddit API Wrapper". It aims to make it very simple for a user to collect data from Reddit for further analysis (e.g. Natural Language Processing), without having to learn the inner workings of PRAW or the Reddit API.

The main functionalities provided by the package currently include:

  1. Ability to collect a sample of post data and comment data from Reddit by simply providing the subreddit names that you wish to collect data from.

  2. Ability to convert that data into a pandas DataFrame in order to inspect it and save it for further use.

  3. Ability to seamlessly update an existing .csv file that contains some sample data collected with the package in the past, with some new sample data that is also collected with the package.

It is currently maintained by Nico Van den Hooff.

Installation

Dependencies

Reddit Data Collector requires Python and:

  • pandas (>=1.3.5)
  • praw (>=7.5.0)
  • tqdm (>=4.62.3)

User installation

The recommended way to install Reddit Data Collector is using pip:

pip install reddit-data-collector

How to Use Reddit Data Collector

Please see the examples directory for step by step instructions on how to use Reddit Data Collector.

Development

Important links

Source code

You can check the latest sources with the command:

git clone https://github.com/nicovandenhooff/reddit-data-collector.git

Contributing

To learn more about making a contribution to Reddit Data Collector, please see the contributing file.

Potential Ideas for Contribution

  • Add ability to collect images from Reddit posts that contain them.
  • Add author information to post and comment data, currently the Reddit API is inconsistent with suspended and deleted author data, so this functionality has not been built in yet.

Testing

After installation, you can launch the test suite, which is contained in the tests/tests.py. Note that you will have to have pytest >= 6.2.5 installed. You can launch the test suite by following these steps from the projects root directory:

  1. Open up tests.py with the following command:
open tests/tests.py

Comment out lines 24 to 30. Change the values in DataCollector() in line 32 to your Reddit credentials.

  1. Run the following command:
pytest tests/test.py

Project History

The project was started in January 2022 by Nico Van den Hooff as a side project while he was completing the UBC Master of Data Science Project. Nico wanted to obtain a sample of posts and comments from Reddit, but noticed that while PRAW existed and provided seamless access to Reddit's API, there was no package available that allowed for a simple method to collect this data.

Inspiration

Certain sections of this README file was inspired by the scikit-learn README.

You might also like...
Auto Join: A GitHub action script to automatically invite everyone to the organization who comment at the issue page.

Auto Invite To Org By Issue Comment A GitHub action script to automatically invite everyone to the organization who comment at the issue page. What is

Auto Liker, Auto Reaction, Auto Comment, Auto Follower Tool. RajeLiker Credit Hacker.
Auto Liker, Auto Reaction, Auto Comment, Auto Follower Tool. RajeLiker Credit Hacker.

Auto Liker, Auto Reaction, Auto Comment, Auto Follower Tool. RajeLiker Credit Hacker. Unlimited RajeLiker Credit Hack. Thanks To RajeLiker.

A simple Discord bot that can fetch definitions and post them in chat.
A simple Discord bot that can fetch definitions and post them in chat.

A simple Discord bot that can fetch definitions and post them in chat. If you are connected to a voice channel, the bot will also read out the definition to you.

A simple fun discord bot using discord.py that can post memes

A simple fun discord bot using discord.py * * Commands $commands - to see all commands $meme - for a random meme from the internet $cry - to make the

One version package to rule them all, One version package to find them, One version package to bring them all, and in the darkness bind them.

AwesomeVersion One version package to rule them all, One version package to find them, One version package to bring them all, and in the darkness bind

A simple script & container to pull COVID data from covidlive.com.au and post a summary to a slack channel
A simple script & container to pull COVID data from covidlive.com.au and post a summary to a slack channel

CovidLive AU Summary Slackbot This bot is a very simple slackbot that pulls data, summarises and posts up to date AU COVID stats to a provided slack c

Track live sentiment for stocks from Reddit and Twitter and identify growing stocks
Track live sentiment for stocks from Reddit and Twitter and identify growing stocks

Market Sentiment About This repository can mainly be used for two things. a. Tracking the live sentiment of stocks from Reddit and Twitter b. Tracking

A reddit.com bot that will return reference links from official python documentation site for the standard library.

Python Docs Bot A reddit.com bot that will return documentation links for the library and language reference sections of the python docs website. The

A Python bot that uses the Reddit API to send users inspiring messages.

AnonBot By Edric Antoine A Python bot that uses the Reddit API to send users inspiring messages. When a message includes 'What would Anon do?', the bo

Releases(v1.1.0)
  • v1.1.0(Mar 14, 2022)

    [1.1.0] - 2022-03-13

    Changed

    • Changed get_data in reddit_data_collector.py to return pandas DataFrame by default
    • Updated tests for the above
    Source code(tar.gz)
    Source code(zip)
  • v1.0.2(Jan 15, 2022)

    [1.0.2] - 2022-01-14

    Fixed

    • Updated _check_subreddit_exists in reddit_data_collector.py to check both names as .lower()
    • Updated tests for the above

    Changed

    • Updated README to include instructions on coverage tests
    Source code(tar.gz)
    Source code(zip)
  • v1.0.1(Jan 12, 2022)

    [1.0.1] - 2022-01-12

    Fixed

    • Spelling error of separate argument in to_pandas function of reddit_data_collector.io.py, previously it was spelt like seperate

    Changed

    • Update example use and move to /examples
    • Update PyPi link in docs to working link
    • Add new potential ideas for contribution
    Source code(tar.gz)
    Source code(zip)
  • v1.0.0(Jan 7, 2022)

Owner
Nico Van den Hooff
UBC Master of Data Science
Nico Van den Hooff
discord token grabber scam - eductional purposes only!

Discord-QR-Scam תופס אסימון תמונה של Discord על אודות סקריפט Python שיוצר אוטומטית קוד QR הונאה של Nitro ותופס את אסימון הדיסקורד בעת סריקה. כלי זה מד

Amit Pinchasi 0 May 22, 2022
News API consisting various sources from Tanzania

Tanzania News API News API consisting various sources from Tanzania. Fork the project Clone the project git clone https://github.com/username/news-a

Innocent Zenda 6 Oct 06, 2022
Automatically searching for vaccine appointments

Vaccine Appointments Automatically searching for vaccine appointments Usage To copy this package, run: git clone https://github.com/TheIronicCurtain/v

58 Apr 13, 2021
a translator bot for discord

TranslatorBOT it is a simple and powerful discord bot, it been used for translating includes more than 100 language, it has a lot of integrated comman

Mear. 2 Feb 03, 2022
Python3 program to control Elgato Ring Light on your local network without Elgato's Control Center software

Elgato Light Controller I'm really happy with my Elgato Key Light from an illumination perspective. However, their control software has been glitchy f

Jeff Tarr 14 Nov 16, 2022
Your custom slash commands Discord bot!

Slashy - Your custom slash-commands bot Hey, I'm Slashy - your friendly neighborhood custom-command bot! The code for this bot exists because I'm like

Omar Zunic 8 Dec 20, 2022
Replacement for the default Dark Sky Home Assistant integration using Pirate Weather

Pirate Weather Integrations This integration is designed to replace the default Dark Sky integration in Home Assistant with a slightly modified, but f

Alexander Rey 129 Jan 06, 2023
Web app for spotify playlist management with last.fm integration

Music Tools Set of utility tools for Spotify and Last.fm. Built on my other libraries for Spotify (spotframework), Last.fm (fmframework) and interfaci

andy 3 Dec 14, 2022
A very basic starter bot based on CryptoKKing with a small balance

starterbot A very basic starter bot based on CryptoKKing with a small balance, use at your own risk. I have since upgraded this script significantly a

Danny Kendrick 2 Dec 05, 2021
A free sniper bot built to work with PancakeSwap: Router V2

Pancakeswap Sniper Bot PancakeSwap sniper bot. Automated sniping bot to snipe crypto coin launches. How it works The sniping bot can be used in three

89 Aug 06, 2022
A Discord bot that may save your day by predicting it.

Sage A Discord bot that may save your day by predicting it.

1 Nov 17, 2022
Telegram Group Management Bot based on Pyrogram

Komi-San Telegram Group Management Bot based on Pyrogram More updates coming soon Support Group Open a Pull request if you wana contribute Example for

33 Nov 07, 2022
Adriano's Diets Consulting Bot - Parses and extracts informations about your diet (files in the Adriano's format).

Adriano's Diets Consulting Bot - Parses and extracts informations about your diet (files in the Adriano's format).

Marco A. 2 Feb 07, 2022
Build better AWS infrastructure

Sceptre About Sceptre is a tool to drive AWS CloudFormation. It automates the mundane, repetitive and error-prone tasks, enabling you to concentrate o

sceptre 1.4k Jan 04, 2023
Telegram bot for logistic - Telegram bot for logistic

Демонстрационный телеграм-бот для нужд транспортной компании Цель проекта Реализ

M1chigun 1 Feb 05, 2022
Discord bot that plays cricket with the user

CricBot Table of content Commands Installation Game rules License Commands S.No Command Use 1. cric Open the home window. This command is not necessa

Raveesh Yadav 1 Nov 19, 2021
AWS Quick Start Team

EKS CDK Quick Start (in Python) DEVELOPER PREVIEW NOTE: Thise project is currently available as a preview and should not be considered for production

AWS Quick Start 83 Sep 18, 2022
A Flask & Twilio Secret Santa app.

🎄 ✨ Secret Santa Twilio ✨ 📱 A contactless Secret Santa game built with Python, Flask and Twilio! Prerequisites 📝 A Twilio account. Sign up here ngr

Sangeeta Jadoonanan 5 Dec 23, 2021
Group Management Bot

❤️ 𝗦𝗛𝗔𝗗𝗜𝗬𝗢 ❤️ A Powerful, Smart And Advance Group Manager ... Written with AioGram , Pyrogram and Telethon... ⭐️ Thanks to everyone who starred

Abdisamad Omar Mohamed 4 Dec 01, 2021
LOL-banner - A discord bot that bans anybody playing league of legends

LOL-banner A discord bot that bans anybody playing league of legends This bot ha

bsd_witch 46 Dec 17, 2022