2020-09-21, 01:50 AM
A 20 year old non-myBB forum that I was a member recently shutdown with very short notice. They were unwilling to provide any data dump, so I rushed to scrape the entire forum using 'wget' so that I'd have something.
I've managed to parse all of the posts in web scrape into well ordered json of the following format:
The challenge at this point is somehow importing all of this data into a myBB 1.8.24 instance. I realize that I won't necessarily get a one to one match on "username" for every post, and I'm ok with that (i'll just use some 'default' user account to assign whatever fails to match).
I think that the two myBB database tables that I'd need to insert this data into are mybb_posts and mybb_threads. Can I safely map my json to the following table fields?
thread_id -> tid
post_tstamp -> dateline
post_id -> pid
title -> subject
post_body -> message
One thing that I'm unclear of is how to handle the post_id (pid) values. Should I just let the sequence figure it out for every insert, and ignore what's in my json dump?
Beyond that, which other columns do I need to account for when inserting this data? What can be left to a default value? Is there any other table that needs to be accounted for as well?
thanks!
I've managed to parse all of the posts in web scrape into well ordered json of the following format:
[
{
"thread_id": 2,
"post_tstamp": 1253489351,
"username": "user0101",
"post_id": 12345,
"title": "post title",
"post_body": "body text",
},
{
"thread_id": 3,
"post_tstamp": 1253491743,
"username": "user0117",
"post_id": 12367,
"title": "another title",
"post_body": "more body text",
]
The challenge at this point is somehow importing all of this data into a myBB 1.8.24 instance. I realize that I won't necessarily get a one to one match on "username" for every post, and I'm ok with that (i'll just use some 'default' user account to assign whatever fails to match).
I think that the two myBB database tables that I'd need to insert this data into are mybb_posts and mybb_threads. Can I safely map my json to the following table fields?
thread_id -> tid
post_tstamp -> dateline
post_id -> pid
title -> subject
post_body -> message
One thing that I'm unclear of is how to handle the post_id (pid) values. Should I just let the sequence figure it out for every insert, and ignore what's in my json dump?
Beyond that, which other columns do I need to account for when inserting this data? What can be left to a default value? Is there any other table that needs to be accounted for as well?
thanks!