MyBB Community Forums

Full Version: MyBB database's Quality
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5
Hey everyone,

I'm studying databases right now at the university and we've just learned something about "Database Quality" and "Normal Forms", there are many normal forms (I hope some of you know what I'm talking about) 1NF, 2NF, 3NF, BCNF, 4NF...., the 3NF and BCNF are the best ones, easier to manage with NO redundant information, and answers queries 10x to 100x faster...

http://en.wikipedia.org/wiki/First_normal_form
http://en.wikipedia.org/wiki/Second_normal_form
http://en.wikipedia.org/wiki/Third_normal_form
http://en.wikipedia.org/wiki/Boyce%E2%80...ormal_form

Now I know that this might not get your interest, but I've just checked MyBB database and it's not even 1NF...

It's really easy to improve a database's quality, though, this will need to edit all the PHP code behind, so I suggest to put this as a suggestion for MyBB 2.0... PHP code will be reduced, because every single data is stored ONE time in ONE appropriate table only in the database, you don't have to update it everywhere, and you're sure you won't miss any data or have redundant data...

I don't think many bulletin board really care about that fact, but I think we should...


Because the forums are growing, and we should really think of something to improve the database quality, (I remember last day when I tried to search the word "time" on this forum, it was giving me a blank page)

So I make this suggestion for MyBB 2.0, what do you think?
I'm fairly sure the MyBB database structure is already pretty well normalised - data is mostly atomic within the core (unless I'm very much mistaken).
But you need to consider number of writes versus number of reads. In a system where reads are typically 10,000x the number of writes, duplicating data over multiple tables is much simpler and a more efficient method.

Currently the most widely support DB engine in MySQL is MyISAM, which does table locking. Running a join over multiple tables will lock all the tables involved. However, if you only need to access a single table, then there is no locking of multiple tables at the same time.

Sure writing will have to happen multiple times but reading will greatly benefit from the redundant data.

the MyShowcase plugin I write seems to meet 2NF, if not 3NF for most tables, but is is very much a PITA to code for. Lots of tables and lots of joins do deal with just for one set of results.
(2011-12-09, 09:06 PM)pavemen Wrote: [ -> ]But you need to consider number of writes versus number of reads. In a system where reads are typically 10,000x the number of writes, duplicating data over multiple tables is much simpler and a more efficient method.

Currently the most widely support DB engine in MySQL is MyISAM, which does table locking. Running a join over multiple tables will lock all the tables involved. However, if you only need to access a single table, then there is no locking of multiple tables at the same time.

Sure writing will have to happen multiple times but reading will greatly benefit from the redundant data.

the MyShowcase plugin I write seems to meet 2NF, if not 3NF for most tables, but is is very much a PITA to code for. Lots of tables and lots of joins do deal with just for one set of results.

Agreed, a database that is only read don't even need to be normalised, more than that, it will be better to denormalize it (profesionally of course)...

http://en.wikipedia.org/wiki/Denormalization

But for MyBB software's database (which is not only being read by users), I think it will be a huge benefit to normalize it into 3NF or BCNF...
After you define the rows and tables you will need for your software, normalizing a database is just a game (really), it's not hard!
but unless your forum has a significant amount of new content relative to the views, it is more or less read-only. I can see very large forums (NCAAbbs, Hackforums, etc) benefiting from it, but most sites will not benefit and it only makes for more complex coding and debugging.
That's not true, coding is much simpler when using a normalized database, let's say you have one table

Table(userID, userName, birthday, age, friendUID, lastPostID)
userID determines userName, emailAddress and lastPostID, birthday determines the age, (userID is the primary key)

This table is 2NF (which is not bad), yet you will have many redundant information, imagine one user has 4 friends, he will have 4 rows in that table, same user userID, userName, birthday, age and lastPostID, but the friendUID changes each time...


UserID | userName | birthday | age | friendUID | lastPostID
---------------------------------------------------------
12     | user1    | 11/11/90 | 21  | 13        | 1920
12     | user1    | 11/11/90 | 21  | 18        | 1920
12     | user1    | 11/11/90 | 21  | 5         | 1920
12     | user1    | 11/11/90 | 21  | 2         | 1920


---------

Which means the PHP code will have to update all the rows and all the redundant information, and when you need to delete that user, you will have to delete all the rows...
That seems easy right? but that's an example of one simple table and I've done simple queries (update / delete)... Imagine that userID is repeated many times in 40 tables and each time the username and other things are repeated also, updating a username is a real drugdery..
But if the username and the other infos exist only ONE time in the database, updating them is a game... (right?)

(MyBB's current database scheme does that, when a user creates a thread, it's user ID and username are stored in the thread table, even if they're already in the users table, and plenty of other examples can be found)

When something is beneficial (less coding, faster results and better software) why not doing it?

Anyway that was just my suggestion (I'm sure those who created the database normalization thing are clever enough to know it will just save time.
I agree that normalization can be beneficial in some case.

I understand that there are a few redundant fields in the MyBB schema, like usernames in forum, thread and post tables. The issue is how often are those records/fields updated relative to how often they are queried? Its a matter of priorities.

Can you share all the instances of duplicated/redundant fields? I can only really think about the username field and maybe the fid field in the posts table (since it already has tid, which you can then obtain fid with via threads table). UIDs are required as that is the join key for most tables.

In my opinion, using multiple joins to get the data you need in a 3NF schema is harder to code and debug than a simple query. This is especially true when it comes to update and delete queries, where joins are not always supported thus requiring multiple queries. Example, one to get UIDs from a list of usernames, then another to delete records where UIDs match. However, this is an RDBMS specific limitation.

It can be more efficient in terms of storage space and query results for a given data request since 3NF and similar benefit from typically integer based keys and searching, which are smaller to store and faster to search than strings.

So it has pro's and con's, like anything.

I compared a query for posts where username="bob" from posts table to querying posts where username="bob" joining on uids to the user table and the simple query was 6x faster averaging 10 queries of each. Add up that time and memory to a busy site and it can add up.
That's because you don't think of a normalized database, in a normalized database, only a key should be used...
Example, in your above example, you wouldn't manage to find the posts of "bob" but of bob's ID (example 132), and that's as easy as your previous query...

Bob's username is stored one time, and you can easily get it by requesting the 'users'...
Because typically anything that is not a "key" isn't unique... If the username is unique (is a key), then you won't need the user ID...

Re-think of it Wink
(2011-12-09, 10:34 PM)TheGarfield Wrote: [ -> ]Bob's username is stored one time, and you can easily get it by requesting the 'users'...
Because typically anything that is not a "key" isn't unique... If the username is unique (is a key), then you won't need the user ID...

Yes, but the username can change. The UID can't. So every time you changed your username it would have to edit a ton of rows with the new username, which would be incredibly inefficient.

I guess i shouldn't say the UID can't change, as I have seen MANY people request help because their forum is messed up when they changed their UID. Confused
^
LOL that's what I've said to pavemen Smile
What I said is username is unreliable, but user ID is reliable Smile

Re-read Wink
Pages: 1 2 3 4 5