MyBB Community Forums

Full Version: importing posts from a web scrape of another forum
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
A 20 year old non-myBB forum that I was a member recently shutdown with very short notice.  They were unwilling to provide any data dump, so I rushed to scrape the entire forum using 'wget' so that I'd have something. 

I've managed to parse all of the posts in web scrape into well ordered json of the following format:

[
{
"thread_id": 2,
"post_tstamp": 1253489351,
"username": "user0101",
"post_id": 12345,
"title": "post title",
"post_body": "body text",
},
{
"thread_id": 3,
"post_tstamp": 1253491743,
"username": "user0117",
"post_id": 12367,
"title": "another title",
"post_body": "more body text",
]


The challenge at this point is somehow importing all of this data into a myBB 1.8.24 instance.  I realize that I won't necessarily get a one to one match on "username" for every post, and I'm ok with that (i'll just use some 'default' user account to assign whatever fails to match).

I think that the two myBB database tables that I'd need to insert this data into are mybb_posts and mybb_threads.  Can I safely map my json to the following table fields?

thread_id -> tid
post_tstamp -> dateline
post_id -> pid
title -> subject
post_body -> message

One thing that I'm unclear of is how to handle the post_id (pid) values.  Should I just let the sequence figure it out for every insert, and ignore what's in my json dump?

Beyond that, which other columns do I need to account for when inserting this data?  What can be left to a default value?  Is there any other table that needs to be accounted for as well?


thanks!
Since there's only one forum you'd like to convert, the easiest way IMO might be to preserve the IDs from your JSON data. That means you won't need to deal with posts -> thread mapping.

Apart from those mappings you've mentioned, you may also pay attention to following fields in MyBB database (in table_name.column_name:
threads.firstpost

Another important table would be the users table, and you should maintain uid across every table in users, threads, posts.

If there's no forums/sections/categories in your JSON data, you still need to create a forum or just use the auto-generated forum in a MyBB installation to hold all the threads and posts. That means the fid column in tables threads and posts.

If there's any poll data, tables polls and pollvotes are what you need to looking into too.
(2020-09-21, 06:54 PM)noyle Wrote: [ -> ]Since there's only one forum you'd like to convert, the easiest way IMO might be to preserve the IDs from your JSON data. That means you won't need to deal with posts -> thread mapping.

Apart from those mappings you've mentioned, you may also pay attention to following fields in MyBB database (in table_name.column_name:
threads.firstpost

Another important table would be the users table, and you should maintain uid across every table in users, threads, posts.

If there's no forums/sections/categories in your JSON data, you still need to create a forum or just use the auto-generated forum in a MyBB installation to hold all the threads and posts. That means the fid column in tables threads and posts.

If there's any poll data, tables polls and pollvotes are what you need to looking into too.


Thanks for the reply!  I have a few additional questions:

  1. Does threads.firstpost correspond with the lowest value of dateline for a given posts.tid for posts.pid ?
  2. If I don't keep track of posts -> thread mapping, how would I ensure that all the imported posts are associated with the correct thread?
  3. Is it safe to arbitrarily assign the next unused pid for every post that I'm importing, or do I should I let the database worry about new pid's ?
thanks!
(2020-09-22, 12:07 AM)netllama Wrote: [ -> ]
  • Does threads.firstpost correspond with the lowest value of dateline for a given posts.tid for posts.pid ?

That depends on your data. But yes, usually the earliest post in a thread is the first post.

(2020-09-22, 12:07 AM)netllama Wrote: [ -> ]
  • If I don't keep track of posts -> thread mapping, how would I ensure that all the imported posts are associated with the correct thread?

If you don't do that or your data don't have that mapping tracked, I'm afraid there won't be a clear way..

(2020-09-22, 12:07 AM)netllama Wrote: [ -> ]
  • Is it safe to arbitrarily assign the next unused pid for every post that I'm importing, or do I should I let the database worry about new pid's ?

Yes you may assign any unused ID to any post/thread/user while importing. That means you can also leave the database system to automatically grow a table's ID which is how MyBB Merge System does, as long as you keep track of mappings of old IDs to new IDs to make sure 1) each post is associated with its thread and 2) each post or thread is associated with its author (and its forum, if you have any).
Thanks. I was able to successfully import the data, and its viewable in the forum. The only issue that I'm currently seeing is that the 'Threads', 'Posts' and "Last Post" data is not updating from the site index page.

After some digging around, i realized that I also needed to update the mybb_forums table with updated information to reflect what was imported.
I think you just need to run a recount & rebuild at AdminCP.

And further, rebuild & reload all caches.