A Python package that can be used to download post and comment data from Reddit.

Overview

Reddit Data Collector

Reddit Data Collector is a Python package that allows a user to collect post and comment data from Reddit. It is built on top of the Python module PRAW, which stands for "The Python Reddit API Wrapper". It aims to make it very simple for a user to collect data from Reddit for further analysis (e.g. Natural Language Processing), without having to learn the inner workings of PRAW or the Reddit API.

The main functionalities provided by the package currently include:

  1. Ability to collect a sample of post data and comment data from Reddit by simply providing the subreddit names that you wish to collect data from.

  2. Ability to convert that data into a pandas DataFrame in order to inspect it and save it for further use.

  3. Ability to seamlessly update an existing .csv file that contains some sample data collected with the package in the past, with some new sample data that is also collected with the package.

It is currently maintained by Nico Van den Hooff.

Installation

Dependencies

Reddit Data Collector requires Python and:

  • pandas (>=1.3.5)
  • praw (>=7.5.0)
  • tqdm (>=4.62.3)

User installation

The recommended way to install Reddit Data Collector is using pip:

pip install reddit-data-collector

How to Use Reddit Data Collector

Please see the examples directory for step by step instructions on how to use Reddit Data Collector.

Development

Important links

Source code

You can check the latest sources with the command:

git clone https://github.com/nicovandenhooff/reddit-data-collector.git

Contributing

To learn more about making a contribution to Reddit Data Collector, please see the contributing file.

Potential Ideas for Contribution

  • Add ability to collect images from Reddit posts that contain them.
  • Add author information to post and comment data, currently the Reddit API is inconsistent with suspended and deleted author data, so this functionality has not been built in yet.

Testing

After installation, you can launch the test suite, which is contained in the tests/tests.py. Note that you will have to have pytest >= 6.2.5 installed. You can launch the test suite by following these steps from the projects root directory:

  1. Open up tests.py with the following command:
open tests/tests.py

Comment out lines 24 to 30. Change the values in DataCollector() in line 32 to your Reddit credentials.

  1. Run the following command:
pytest tests/test.py

Project History

The project was started in January 2022 by Nico Van den Hooff as a side project while he was completing the UBC Master of Data Science Project. Nico wanted to obtain a sample of posts and comments from Reddit, but noticed that while PRAW existed and provided seamless access to Reddit's API, there was no package available that allowed for a simple method to collect this data.

Inspiration

Certain sections of this README file was inspired by the scikit-learn README.

You might also like...
Auto Join: A GitHub action script to automatically invite everyone to the organization who comment at the issue page.

Auto Invite To Org By Issue Comment A GitHub action script to automatically invite everyone to the organization who comment at the issue page. What is

Auto Liker, Auto Reaction, Auto Comment, Auto Follower Tool. RajeLiker Credit Hacker.
Auto Liker, Auto Reaction, Auto Comment, Auto Follower Tool. RajeLiker Credit Hacker.

Auto Liker, Auto Reaction, Auto Comment, Auto Follower Tool. RajeLiker Credit Hacker. Unlimited RajeLiker Credit Hack. Thanks To RajeLiker.

A simple Discord bot that can fetch definitions and post them in chat.
A simple Discord bot that can fetch definitions and post them in chat.

A simple Discord bot that can fetch definitions and post them in chat. If you are connected to a voice channel, the bot will also read out the definition to you.

A simple fun discord bot using discord.py that can post memes

A simple fun discord bot using discord.py * * Commands $commands - to see all commands $meme - for a random meme from the internet $cry - to make the

One version package to rule them all, One version package to find them, One version package to bring them all, and in the darkness bind them.

AwesomeVersion One version package to rule them all, One version package to find them, One version package to bring them all, and in the darkness bind

A simple script & container to pull COVID data from covidlive.com.au and post a summary to a slack channel
A simple script & container to pull COVID data from covidlive.com.au and post a summary to a slack channel

CovidLive AU Summary Slackbot This bot is a very simple slackbot that pulls data, summarises and posts up to date AU COVID stats to a provided slack c

Track live sentiment for stocks from Reddit and Twitter and identify growing stocks
Track live sentiment for stocks from Reddit and Twitter and identify growing stocks

Market Sentiment About This repository can mainly be used for two things. a. Tracking the live sentiment of stocks from Reddit and Twitter b. Tracking

A reddit.com bot that will return reference links from official python documentation site for the standard library.

Python Docs Bot A reddit.com bot that will return documentation links for the library and language reference sections of the python docs website. The

A Python bot that uses the Reddit API to send users inspiring messages.

AnonBot By Edric Antoine A Python bot that uses the Reddit API to send users inspiring messages. When a message includes 'What would Anon do?', the bo

Releases(v1.1.0)
  • v1.1.0(Mar 14, 2022)

    [1.1.0] - 2022-03-13

    Changed

    • Changed get_data in reddit_data_collector.py to return pandas DataFrame by default
    • Updated tests for the above
    Source code(tar.gz)
    Source code(zip)
  • v1.0.2(Jan 15, 2022)

    [1.0.2] - 2022-01-14

    Fixed

    • Updated _check_subreddit_exists in reddit_data_collector.py to check both names as .lower()
    • Updated tests for the above

    Changed

    • Updated README to include instructions on coverage tests
    Source code(tar.gz)
    Source code(zip)
  • v1.0.1(Jan 12, 2022)

    [1.0.1] - 2022-01-12

    Fixed

    • Spelling error of separate argument in to_pandas function of reddit_data_collector.io.py, previously it was spelt like seperate

    Changed

    • Update example use and move to /examples
    • Update PyPi link in docs to working link
    • Add new potential ideas for contribution
    Source code(tar.gz)
    Source code(zip)
  • v1.0.0(Jan 7, 2022)

Owner
Nico Van den Hooff
UBC Master of Data Science
Nico Van den Hooff
A EddieHub API python package.

EddieHub A EddieHub API python package. Made with Python3 (C) @FayasNoushad Copyright permission under MIT License License - https://github.com/Fayas

Fayas Noushad 5 Sep 22, 2021
🀟The VC Music Source code of @DaisyXBot ❀️ v3 Out now

DAISYXMUSIC V3 🎡 A bot that can play music on telegram group's voice call Available on telegram as @DaisyXbot Whats new πŸ”₯ Thumbnail Support Playlist

TeamDaisyX 207 Dec 05, 2022
This repository contains free labs for setting up an entire workflow and DevOps environment from a real-world perspective in AWS

DevOps-The-Hard-Way-AWS This tutorial contains a full, real-world solution for setting up an environment that is using DevOps technologies and practic

Mike Levan 1.6k Jan 05, 2023
Github integration with Telegram

The Telegram bot myGit is your GiHub assistant. In your conversations with your team, you can simply insert the information about the projects you are working at.

Alexandru Buzescu 2 Jan 06, 2022
A Bot, which observes your counting-abilities and controls your drinking-habits, too!

Discord Counting Bot with Beer-Counter Heavily inspired by AlexVerricos Counting_bot, but adjusted a lot for the beer drinking habits of students. Inv

Jakob Jung 3 Oct 18, 2022
Integrating Amazon API Gateway private endpoints with on-premises networks

Integrating Amazon API Gateway private endpoints with on-premises networks Read the blog about this application: Integrating Amazon API Gateway privat

AWS Samples 12 Sep 09, 2022
Find people to play tennis with.

40Love 40Love is a full-stack web application that helps tennis players find hits at public tennis courts. Players can select public courts on the map

Tanner Schmutte 27 Jun 08, 2022
An anime themed telegram bot that can convert telegram media.

ShoukoKomiRobot β€’ π•Žπ•£π•šπ•₯π•₯π•–π•Ÿ π•€π•Ÿ Python3 β€’ π•ƒπ•šπ•“π•£π•’π•£π•ͺ π•Œπ•€π•–π•• Pyrogram β€’ π•Šπ• π•—π•₯𝕨𝕒𝕣𝕖 π•Œπ•€π•–π•• Ebook-convert Deploy π”½π• π•£π•œ π•₯π•™π•šπ•€ 𝕣

25 Aug 14, 2022
SimpleTelegramScraper - A python script scrapes accounts from public groups via Telegram API and saves them in a CSV file

SimpleTelegramScraper - the best scraper on GitHub This simple python script scr

Deniz Shabani 12 Oct 06, 2022
Some python code to make twitter bots ;)

How to set up a twitter bot using python's tweepy library Create a twitter developer account and project Make sure you are logged into your twitter ac

Wael 2 Jan 10, 2022
A thin Python Wrapper for the Dark Sky (formerly forecast.io) weather API

Dark Sky Wrapper This is a wrapper for the Dark Sky (formerly forecast.io) API. It allows you to get the weather for any location, now, in the past, o

Ze'ev Gilovitz 414 Nov 16, 2022
TypeRig is a Python library aimed at simplifying the current FontLab API

TypeRig TypeRig is a Python library aimed at simplifying the current FontLab API while offering some additional functionality that is heavily biased t

Vassil Kateliev 41 Nov 02, 2022
Enables you to execute scripts and perform API requests in MikroTik router

HomeAssistant component: MikroTik API The mikrotik_api platform enables you to execute scripts and perform API requests in MikroTik router To enable M

Pavel S 6 Aug 12, 2022
An automated tool that fetches information about your crypto stake and generates historical data in time.

Introduction Yield explorer is a WIP! I needed a tool that would show me historical data and performance of my staked crypto but was unable to find a

Sedat Can YalΓ§Δ±n 42 Nov 26, 2022
Paid Udemy Courses with Coupons

Freedemy Paid Udemy Courses with Coupons Steps to run pip3 install -r requirements.txt python3 free-courses.py Then you can click the Enroll Link and

GOKUL A.P 23 Dec 14, 2022
Telegram bot for stream music or video on telegram

KYURA MUSIC Telegram bot for stream music or video on telegram, powered by PyTgCalls and Pyrogram Help Need Help me to translate this repo, click the

0 Dec 08, 2022
Kakatua discord music bot

Donate Ayo donasi! Lokal Internasional Ucapan Terima Kasih Tentu saja, donatur Bunga dan talent-talent h!mawari. Semoga rezeki teman-teman semakin lan

1 Oct 30, 2021
Advance Anonymous Sender bot with Caption Editor

AnonyMous Sender πŸ‘¨β€πŸ’» Advanced Anonymous Sender with Caption Editor Join @DaisySupport_Official 🎡 for help Features Get forwarded messages without f

Inuka Asith 13 Oct 09, 2022
Apps related to Odoo it's calendar features

calendar Apps related to Odoo it's calendar/appointments features: online_appointment_locations: allow setting an online URL per employee online_appoi

Yenthe Van Ginneken 3 Oct 27, 2022
A Python Instagram Scraper for Downloading Profile's Posts, stories, ProfilePic and See the Details of Particular Instagram Profile.

βœ” βœ” InstAstra ⚑ ⚑ ⁜ Description ~ A Python Instagram Scraper for Downloading Profile's Posts, stories, ProfilePic and See the Details of Particular In

12 Jun 23, 2022