Wikidot-forum-dump - Simple Python script that dumps a Wikidot wiki forum into JSON structures.

Overview

wikidot-forum-dump

Script is partially based on 2stacks by bluesoul: https://github.com/scuttle/2stacks

To dump a Wiki's forum, edit config.py and put the required Wiki name (default — scp-wiki, which is the SCP EN community).

If your forum is too large, you may edit threads parameter in config.py, however make sure you don't overload Wikidot with requests — no one knows what that may result in.

Then make sure you have the following Python packages:

beautifulsoup4

Run python . to start dumping.

Note: incremental dump (i.e. updating existing categories or threads) is not supported.

However, if the process is interrupted at any point, it can be seamlessly resumed per-category and per-thread.

The dump will be stored under the following structure:

dump/
  categories.json           -- contains group and category names and IDs
  categories/
    
   
    .json      -- contains list of threads for each category by ID
  threads/
    
    
     .json        -- contains each thread with all posts and replies

    
   

Example of a group+categories record (from SCP-EN):

  {
    "title": "Site Announcements and Proposals",
    "description": "Announce new pages, suggest policy, and interact with new site members.",
    "categories": [
      {
        "title": "Sitewide Announcements",
        "description": "Announcement of any sitewide changes or events. For usage by both staff and non-staff.",
        "id": 1113520
      },
      {
        "title": "Page Announcements",
        "description": "Announce posting of new pages and deletion of old pages. Authors, please use the collective SCP, Tale, GOI Entry, and Update threads for new works.",
        "id": 7409511
      },
      {
        "title": "Proposals And Policy",
        "description": "What can we do to improve the site? Ask any questions you may have regarding site structure and policy.",
        "id": 51015
      },
      {
        "title": "Introductions",
        "description": "New to the site? Introduce yourself and meet other site members here.",
        "id": 72352
      }
    ]
  }

Example of a thread record (also from SCP-EN):

Hey everyone.

\n

Currently, we're having some Wikidot problems with the Master Admin position. We've moved the Master Admin position to a different account (under Mann's control) in the meantime, but there may be problems for a few days while we sort things out.

\n

The thing most prominently affected by this will be uploaded files. Until the issue is resolved, please host images on an external hosting site like Imgur. After image uploading has resumed functioning, Staff will reupload the images on the mainsite manually. This will affect the sandbox as well.

\n

Helen (our IRC bot) will also be temporarily down until the problem has been resolved. This also means some Helen-adjacent functions may not be functional, or may experience issues.

\n

We thank you for your cooperation during this. If you have any questions about the above or any other problems you come across, feel free to ask in this thread or message a staff member. Thank you.

", "replies": [ { "user": { "avatar": "http://www.wikidot.com/avatar.php?userid=3695324&size=small&timestamp=1640941899", "name": "CephalopodStevenson", "id": 3695324 }, "date": 1640903834, "title": "Re: Technical Issues Announcement", "content": "

So, I’ll preface by saying that I have very little knowledge of how wikidot works. I hope somebody can enlighten me here and explain what any of this means. Why does a master admin problem affect media files? What exactly is the problem? Who was controlling the master admin account before, and is this an issue with said user or wikidot itself?

\n

Also, “most prominently affected” suggests that there may be other concerns, right? I’m guessing this means we should be backing up our content regardless of media inserts. Again, I’m not super knowledgeable on any of this, so if someone could clarify a bit, I’d appreciate it.

", "replies": [ { "user": { "avatar": "http://www.wikidot.com/avatar.php?userid=2005044&size=small&timestamp=1640941899", "name": "Decibelles", "id": 2005044 }, "date": 1640905152, "title": "Re: Technical Issues Announcement", "replies": [], "content": "
\n

Why does a master admin problem affect media files? What exactly is the problem? Who was controlling the master admin account before, and is this an issue with said user or wikidot itself?

\n
\n

This one's a bit of a tricky one to explain, as the answers are all interconnected. The Master Admin before was Mann. Mann has a Pro account. Wikidot's subscription plans allow a site to have a bunch of features that you cannot have if you make a site with a Free account. Among these features is an upgrade in storage space, IE, how much you can upload to a site. In addition to this, you can also buy more storage as well.

\n

The master admin status presumably has shifted from Mann to another account, still under Mann's control. However, this other account presumably does not have Pro. So therefore, storage limits will have naturally been hit. I can't pretend to know what the issue was that resulted in the transfer of status, but this would end up resulting in storage now being at a premium until the situation can be resolved.

\n
\n

I’m guessing this means we should be backing up our content regardless of media inserts.

\n
\n

I'm not sure what anything else affected could be, but it is 100% always a good idea to back up your content from this site as best as you are able to.

" }, { "user": { "avatar": "http://www.wikidot.com/avatar.php?userid=2199269&size=small&timestamp=1640941899", "name": "Yossipossi", "id": 2199269 }, "date": 1640905211, "title": "Re: Technical Issues Announcement", "content": "

\n
\n
\n \n
\n \n
\n
\n
\n

Why does a master admin problem affect media files?

\n
\n

As we are transferring the Master Admin permission to a separate account without Pro Plus, until the new account can receive Pro Plus, we will have reduced image size for the Wiki. Thus, no new images can be uploaded.

\n
\n

What exactly is the problem?

\n
\n

The exact issue we are trying to solve will not be disclosed at this time, however it is related to a Wikidot issue.

\n
\n

Who was controlling the master admin account before, and is this an issue with said user or wikidot itself?

\n
\n

Mann was, and still is, Master Admin. The account was changed from \"DrEverettMann\"DrEverettMann to a separate alt, which is still under Mann's control. This is solely a Wikidot problem.

\n
\n

Also, “most prominently affected” suggests that there may be other concerns, right? I’m guessing this means we should be backing up our content regardless of media inserts.

\n
\n

It is recommended to back up your articles regardless (Wikidot may go under at any moment, of course, but that's not new information). Other affected things are related to the alt's lack of Pro Plus (for the time being), such as HTTP instead of HTTPS and login issues for some users.

\n
\n
\n \n
\n
\n
\n
\n ", "replies": [ { "user": { "avatar": "http://www.wikidot.com/avatar.php?userid=3695324&size=small&timestamp=1640941899", "name": "CephalopodStevenson", "id": 3695324 }, "date": 1640905510, "title": "Re: Technical Issues Announcement", "replies": [], "content": "

Thanks both of you for the informative responses. I feel I have a much better understanding of the situation now.

" } ] } ] } ] } ] }">
{
  "breadcrumbs": [
    "Forum",
    "Site Announcements and Proposals / Sitewide Announcements",
    "Technical Issues Announcement"
  ],
  "category": 1113520,
  "base_page_id": null,
  "description": "An announcement regarding some Wikidot technical difficulties and mitigations, with particular reference to file uploads.",
  "user": {
    "avatar": "http://www.wikidot.com/avatar.php?userid=3075960&size=small&timestamp=1640941899",
    "name": "stormbreath",
    "id": 3075960
  },
  "date": 1640899718,
  "posts": [
    {
      "user": {
        "avatar": "http://www.wikidot.com/avatar.php?userid=3075960&size=small&timestamp=1640941899",
        "name": "stormbreath",
        "id": 3075960
      },
      "date": 1640899718,
      "title": "Technical Issues Announcement",
      "content": "

Hey everyone.

\n

Currently, we're having some Wikidot problems with the Master Admin position. We've moved the Master Admin position to a different account (under Mann's control) in the meantime, but there may be problems for a few days while we sort things out.

\n

The thing most prominently affected by this will be uploaded files. Until the issue is resolved, please host images on an external hosting site like Imgur. After image uploading has resumed functioning, Staff will reupload the images on the mainsite manually. This will affect the sandbox as well.

\n

Helen (our IRC bot) will also be temporarily down until the problem has been resolved. This also means some Helen-adjacent functions may not be functional, or may experience issues.

\n

We thank you for your cooperation during this. If you have any questions about the above or any other problems you come across, feel free to ask in this thread or message a staff member. Thank you.

", "replies": [ { "user": { "avatar": "http://www.wikidot.com/avatar.php?userid=3695324&size=small&timestamp=1640941899", "name": "CephalopodStevenson", "id": 3695324 }, "date": 1640903834, "title": "Re: Technical Issues Announcement", "content": "

So, I’ll preface by saying that I have very little knowledge of how wikidot works. I hope somebody can enlighten me here and explain what any of this means. Why does a master admin problem affect media files? What exactly is the problem? Who was controlling the master admin account before, and is this an issue with said user or wikidot itself?

\n

Also, “most prominently affected” suggests that there may be other concerns, right? I’m guessing this means we should be backing up our content regardless of media inserts. Again, I’m not super knowledgeable on any of this, so if someone could clarify a bit, I’d appreciate it.

"
, "replies": [ { "user": { "avatar": "http://www.wikidot.com/avatar.php?userid=2005044&size=small&timestamp=1640941899", "name": "Decibelles", "id": 2005044 }, "date": 1640905152, "title": "Re: Technical Issues Announcement", "replies": [], "content": "
\n

Why does a master admin problem affect media files? What exactly is the problem? Who was controlling the master admin account before, and is this an issue with said user or wikidot itself?

\n
\n

This one's a bit of a tricky one to explain, as the answers are all interconnected. The Master Admin before was Mann. Mann has a Pro account. Wikidot's subscription plans allow a site to have a bunch of features that you cannot have if you make a site with a Free account. Among these features is an upgrade in storage space, IE, how much you can upload to a site. In addition to this, you can also buy more storage as well.

\n

The master admin status presumably has shifted from Mann to another account, still under Mann's control. However, this other account presumably does not have Pro. So therefore, storage limits will have naturally been hit. I can't pretend to know what the issue was that resulted in the transfer of status, but this would end up resulting in storage now being at a premium until the situation can be resolved.

\n
\n

I’m guessing this means we should be backing up our content regardless of media inserts.

\n
\n

I'm not sure what anything else affected could be, but it is 100% always a good idea to back up your content from this site as best as you are able to.

" }, { "user": { "avatar": "http://www.wikidot.com/avatar.php?userid=2199269&size=small&timestamp=1640941899", "name": "Yossipossi", "id": 2199269 }, "date": 1640905211, "title": "Re: Technical Issues Announcement", "content": "

\n
\n
\n \n
\n \n
\n
\n
\n

Why does a master admin problem affect media files?

\n
\n

As we are transferring the Master Admin permission to a separate account without Pro Plus, until the new account can receive Pro Plus, we will have reduced image size for the Wiki. Thus, no new images can be uploaded.

\n
\n

What exactly is the problem?

\n
\n

The exact issue we are trying to solve will not be disclosed at this time, however it is related to a Wikidot issue.

\n
\n

Who was controlling the master admin account before, and is this an issue with said user or wikidot itself?

\n
\n

Mann was, and still is, Master Admin. The account was changed from \"DrEverettMann\"DrEverettMann to a separate alt, which is still under Mann's control. This is solely a Wikidot problem.

\n
\n

Also, “most prominently affected” suggests that there may be other concerns, right? I’m guessing this means we should be backing up our content regardless of media inserts.

\n
\n

It is recommended to back up your articles regardless (Wikidot may go under at any moment, of course, but that's not new information). Other affected things are related to the alt's lack of Pro Plus (for the time being), such as HTTP instead of HTTPS and login issues for some users.

\n
\n
\n \n
\n
\n
\n
\n ", "replies": [ { "user": { "avatar": "http://www.wikidot.com/avatar.php?userid=3695324&size=small&timestamp=1640941899", "name": "CephalopodStevenson", "id": 3695324 }, "date": 1640905510, "title": "Re: Technical Issues Announcement", "replies": [], "content": "

Thanks both of you for the informative responses. I feel I have a much better understanding of the situation now.

"
} ] } ] } ] } ] }

Note: contrarily to what it says, base_page_id is not an ID, but a slug (e.g. scp-173).

This exists only for page discussion threads.

Owner
ZZYZX
ZZYZX
JSONx - Easy JSON wrapper packed with features.

🈷️ JSONx Easy JSON wrapper packed with features. This was made for small discord bots, for big bots you should not use this JSON wrapper. 📥 Usage Cl

2 Dec 25, 2022
A JSON API for returning Godspeak sentences. Based on the works of Terry A Davis (Rest in Peace, King)

GodspeakAPI A simple API for generating random words ("godspeaks"), inspired by the works of Terrence Andrew Davis (Rest In Peace, King). Installation

Eccentrici 3 Jan 24, 2022
A python library to convert arbitrary strings representing business opening hours into a JSON format that's easier to use in code

A python library to convert arbitrary strings representing business opening hours into a JSON format that's easier to use in code

Adrian Edwards 9 Dec 02, 2022
Ibmi-json-beautify - Beautify json string with python

Ibmi-json-beautify - Beautify json string with python

Jefferson Vaughn 3 Feb 02, 2022
With the help of json txt you can use your txt file as a json file in a very simple way

json txt With the help of json txt you can use your txt file as a json file in a very simple way Dependencies re filemod pip install filemod Installat

Kshitij 1 Dec 14, 2022
MOSP is a platform for creating, editing and sharing validated JSON objects of any type.

MONARC Objects Sharing Platform Presentation MOSP is a platform for creating, editing and sharing validated JSON objects of any type. You can use any

CASES Luxembourg 72 Dec 14, 2022
JSONManipulator is a Python package to retrieve, add, delete, change and store objects in JSON files.

JSONManipulator JSONManipulator is a Python package to retrieve, add, delete, change and store objects in JSON files. Installation Use the package man

Andrew Polukhin 1 Jan 07, 2022
Python script to extract news from RSS feeds and save it as json.

Python script to extract news from RSS feeds and save it as json.

Alex Trbznk 14 Dec 22, 2022
Convert Wii UI formats to JSON5 and vice versa

Convert Wii UI formats to JSON5 and vice versa

Pablo Stebler 11 Aug 28, 2022
Generate code from JSON schema files

json-schema-codegen Generate code from JSON schema files. Table of contents Introduction Currently supported languages Requirements Installation Usage

Daniele Esposti 30 Dec 23, 2022
Simple, minimal conversion of Bus Open Data Service SIRI-VM data to JSON

Simple, minimal conversion of Bus Open Data Service SIRI-VM data to JSON

Andy Middleton 0 Jan 22, 2022
Creates fake JSON files from a JSON schema

Use jsf along with fake data generators to provide consistent and meaningful fake data for your system.

Andy Challis 86 Jan 03, 2023
An tiny CLI to load data from a JSON File during development.

JSON Server - An tiny CLI to load data from a JSON File during development.

Yuvraj.M 4 Mar 22, 2022
Convert your JSON data to a valid Python object to allow accessing keys with the member access operator(.)

JSONObjectMapper Allows you to transform JSON data into an object whose members can be queried using the member access operator. Unlike json.dumps in

Owen Trump 4 Jul 20, 2022
A Cobalt Strike Scanner that retrieves detected Team Server beacons into a JSON object

melting-cobalt 👀 A tool to hunt/mine for Cobalt Strike beacons and "reduce" their beacon configuration for later indexing. Hunts can either be expans

Splunk GitHub 150 Nov 23, 2022
Atom, RSS and JSON feed parser for Python 3

Atoma Atom, RSS and JSON feed parser for Python 3. Quickstart Install Atoma with pip: pip install atoma

Nicolas Le Manchet 95 Nov 28, 2022
No more boilerplate to check and build a Python object from JSON.

JSONloader This module is for you if you're tired of writing boilerplate that: builds a straightforward Python object from loaded JSON. checks that yo

3 Feb 05, 2022
Low code JSON to extract data in one line

JSON Inline Low code JSON to extract data in one line ENG RU Installation pip install json-inline Usage Rules Modificator Description ?key:value Searc

Aleksandr Sokolov 12 Mar 09, 2022
simplejson is a simple, fast, extensible JSON encoder/decoder for Python

simplejson simplejson is a simple, fast, complete, correct and extensible JSON http://json.org encoder and decoder for Python 3.3+ with legacy suppo

1.5k Jan 05, 2023
This open source Python project allow you to create JSON data trees using Minmup.com

This open source Python project allow you to create JSON data trees using Minmup.com. I try to develop this project all the time. But feel free to use :).

Arttu Väisänen 1 Jan 30, 2022