MyBB Community Forums

Full Version: Performance Suggestion
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
This is a big one. I'm having to go through all my queries and functions looking for some deep problem areas. I just found one. So I'm going to immediately ask that MyBB team review this and discuss if what I'm saying is right. This effects performance for all MyBB admin.

This about the function fetch_unread_count() inside inc/functions_indicators.php. This function basically checks to make sure any unread forums will display the indicator that a new posts/thread exists.

I'm investigating this carefully and I find this is only called on one page forumdisplay.php.

// If there are no unread threads in this forum and no unread child forums - mark it as read
require_once MYBB_ROOT."inc/functions_indicators.php";
if(fetch_unread_count($fid) == 0 && $unread_forums == 0)
{
	mark_forum_read($fid);
}

Essentially if unread_count returns 0 then the forums are mark read and indicator should be clear. However if it returns 1 or more then we should see the indicator.

But let's review the actual function please.

function fetch_unread_count($fid)
{
	global $db, $mybb;

	$cutoff = TIME_NOW-$mybb->settings['threadreadcut']*60*60*24;

	if($mybb->user['uid'] == 0)
	{
		$comma = '';
		$tids = '';
		$threadsread = unserialize($mybb->cookies['mybb']['threadread']);
		$forumsread = unserialize($mybb->cookies['mybb']['forumread']);
		if(is_array($threadsread))
		{
			foreach($threadsread as $key => $value)
			{
				$tids .= $comma.intval($key);
				$comma = ',';
			}
		}
		
		if(!empty($tids))
		{
			$count = 0;
			
			// We set a limit to 100 otherwise it'll become too processor intensive, especially if we have many threads.
			$query = $db->query("
				SELECT lastpost, tid, fid
				FROM ".TABLE_PREFIX."threads
				WHERE visible=1 AND closed NOT LIKE 'moved|%' AND fid IN ($fid) AND tid IN ($tids) AND lastpost > '{$cutoff}'
				LIMIT 100
			");
			while($thread = $db->fetch_array($query))
			{
				if($thread['lastpost'] > intval($threadsread[$thread['tid']]) && $thread['lastpost'] > intval($forumsread[$thread['fid']]))
				{
					++$count;
				}
			}
			return $count;
		}
	}
	else
	{
		switch($db->type)
		{
			case "pgsql":
				$query = $db->query("
					SELECT COUNT(t.tid) AS unread_count
					FROM ".TABLE_PREFIX."threads t
					LEFT JOIN ".TABLE_PREFIX."threadsread tr ON (tr.tid=t.tid AND tr.uid='{$mybb->user['uid']}')
					LEFT JOIN ".TABLE_PREFIX."forumsread fr ON (fr.fid=t.fid AND fr.uid='{$mybb->user['uid']}')
					WHERE t.visible=1 AND t.closed NOT LIKE 'moved|%' AND t.fid IN ($fid) AND t.lastpost > COALESCE(tr.dateline,$cutoff) AND t.lastpost > COALESCE(fr.dateline,$cutoff) AND t.lastpost>$cutoff
				");
				break;
			default:
				$query = $db->query("
					SELECT COUNT(t.tid) AS unread_count
					FROM ".TABLE_PREFIX."threads t
					LEFT JOIN ".TABLE_PREFIX."threadsread tr ON (tr.tid=t.tid AND tr.uid='{$mybb->user['uid']}')
					LEFT JOIN ".TABLE_PREFIX."forumsread fr ON (fr.fid=t.fid AND fr.uid='{$mybb->user['uid']}')
					WHERE t.visible=1 AND t.closed NOT LIKE 'moved|%' AND t.fid IN ($fid) AND t.lastpost > IFNULL(tr.dateline,$cutoff) AND t.lastpost > IFNULL(fr.dateline,$cutoff) AND t.lastpost>$cutoff
				");
		}
		return $db->fetch_field($query, "unread_count");
	}
}

That's fairly big function but what really catches my eye is the query.

				$query = $db->query("
					SELECT COUNT(t.tid) AS unread_count
					FROM ".TABLE_PREFIX."threads t
					LEFT JOIN ".TABLE_PREFIX."threadsread tr ON (tr.tid=t.tid AND tr.uid='{$mybb->user['uid']}')
					LEFT JOIN ".TABLE_PREFIX."forumsread fr ON (fr.fid=t.fid AND fr.uid='{$mybb->user['uid']}')
					WHERE t.visible=1 AND t.closed NOT LIKE 'moved|%' AND t.fid IN ($fid) AND t.lastpost > IFNULL(tr.dateline,$cutoff) AND t.lastpost > IFNULL(fr.dateline,$cutoff) AND t.lastpost>$cutoff
				");


That's a monster. But I have to ask..why is there not a LIMIT command? A LIMIT 1 should greatly increase speed and performance of this query. Because the count only has to be 0 or 1 for the feature to work.

Please examine what I'm stating and if I've made an error in my judgement please let me know.

btw above that query is one with a LIMIT of 100. Why not just make it LIMIT 1?
Have you tried this out in public? Does it have any regressions you might have found?
If you want to optimize there, just remove the if condition entirely. It will change the semantics of the marker (forum marked read when user was shown the list of threads in a forum, regardless whether those threads were read or not), but in a large forum, who would notice the difference or even care? The forum markers are entirely useless there, as they'll be always on anyway - in a large forum there are new posts everywhere all the time. You could replace them with a static image and almost no one would notice.

It's small forums where these markers do matter. And unfortunately they don't work right - it's the largest weak point of MyBB.
@Tomm - I've made the change at HF. Function seems to work as expected without any negatives. You can view it yourself. Mark forum as read then wait a minute or two and refresh. You'll see the function working fine.

The query seems fine to me. But it should be limited to one result because how it's used it basically just adding the indicator if a TRUE for unread_count. $unread_count can be 1000 or 20 and it's function will be the same. Only changes required to make this work as I've described is to change the LIMIT 100 to LIMIT 1 and add a LIMIT 1 to the other queries for default and pgsql.

Anyone else want to experiment with this change?


@frostschutz yes you're right. Might be best to completely remove the function. I'm trying my hardest to maintain the MyBB tree. I feel that certain optimizations can be done still that are within MyBB. I'm still a believer in running the software as close to base as possible.

Quote:And unfortunately they don't work right - it's the largest weak point of MyBB

This change may actually help that problem.
limit 1 won't give the correct result, this way it just checks one random thread whether it's read or not and marks the forum based on that. so the result what happens with the forum bulb is random.

that's still the case for limit 100 but the probability to have missed an unread thread when you looked at 100 of them, is far lower.

technically the logic of this function is borked either way
Quote:limit 1 won't give the correct result, this way it just checks one random thread whether it's read or not and marks the forum based on that. so the result what happens with the forum bulb is random.

Wrong. It's not a random thread at all. It grabs any thread within the WHERE statement. Just one thread filling that query means it works correctly.

Look at this again.
// If there are no unread threads in this forum and no unread child forums - mark it as read
require_once MYBB_ROOT."inc/functions_indicators.php";
if(fetch_unread_count($fid) == 0 && $unread_forums == 0)
{
    mark_forum_read($fid);
} 

It's just looking for 0. No reason to return 100, 2000, 300000. Either 1 or 0 will suffice.

It views threadsread and forumsread table so it's knows last read.

Here is the query I see in debug as a logged in member...so it's the longer version.

SELECT COUNT(t.tid) AS unread_count FROM mybb_threads t LEFT JOIN mybb_threadsread tr ON (tr.tid=t.tid AND tr.uid='1') LEFT JOIN mybb_forumsread fr ON (fr.fid=t.fid AND fr.uid='1') WHERE t.visible=1 AND t.closed NOT LIKE 'moved|%' AND t.fid IN (25) AND t.lastpost > IFNULL(tr.dateline,1294508180) AND t.lastpost > IFNULL(fr.dateline,1294508180) AND t.lastpost>1294508180 LIMIT 1

It joins the forumsread and threadsreads and checks it with tr.dateline and fr.dateline. It's searching for a new thread. LIMIT 1 is only going to grab one result if it exists. Otherwise it will search whole table to get 0. But without a limit it's going to count the entire threads table but only one result is required for this function to work correctly.

The function with the limit 100 is probably for guests but still....that's not well thought out either because one result means you see the on gif. So just search for one.

If I'm wrong about how that works please lmk. I'm positive that this query can be optimized anyways. It's sloppy to do a whole table scan just to find one new thread.
                        // We set a limit to 100 otherwise it'll become too processor intensive, especially if we have many threads.
                        $query = $db->query("
                                SELECT lastpost, tid, fid
                                FROM ".TABLE_PREFIX."threads
                                WHERE visible=1 AND closed NOT LIKE 'moved|%' AND fid IN ($fid) AND tid IN ($tids) AND lastpost > '{$cutoff}'
                                LIMIT 100
                        ");
                        while($thread = $db->fetch_array($query))
                        {
                                if($thread['lastpost'] > intval($threadsread[$thread['tid']]) && $thread['lastpost'] > intval($forumsread[$thread['fid']]))
                                {
                                        ++$count;
                                }
                        }
                        return $count;

For each thread it gets from the query, it checks if that thread is read (the query itself doesn't tell). Only if it is unread does the count get increased. Looking at one random thread, it could be either read, or unread, and the forum would get marked based on that. Result is random.

And yeah sure it could be optimized. It uses threadreadcut here, when it could use the current forumread (if it's larger), which in turn would yield a way shorter list of threads (only threads of the last 30 minutes instead of the entire day, or 60 days as it would be in my forum Smile ) for example.

Or you could still just remove it altogether. It doesn't achieve what it wants to achieve anyway.
I can only speak on the bottom part where you have a uid and it's grabbing data from threadsread table. That's my biggest concern.

And viewing HF as a guest. It seems to be working perfectly well. Test it yourself.

But if team could review this function carefully for optimizing that would be great.

EDIT: I just thought of something.

This entire function is trying to see if there are new threads but why not convert it to be new posts and just grab the data from forums table which keeps track of lastpost date. A much faster query is to check lastpost against users lastvisit. We only need a true or false result in this function and that's what we need to look for. The quickest way to say "yes new posts". We only want one result and count is worthless.
that's basically how it works but not what it's about in this particular piece of code

the forum is unread if last post newer than forumread, it's used that way already

what happens here is that it tries to decide whether to update the forum read timestamp, depending on whether there are still unread threads in it. and the last post does not necessarily help you in that regard. the last post can be read while an older thread might not be.

what you're suggesting is the same thing what I'm suggesting except that you can just update the forumread unconditionally then. otherwise if you don't update it if there's a post newer than current markread then it will never be updated because the post is always there

basically the question here is:

if you have a forum showing up as unread
and you go into the forum and there are 2 unread threads
and you read one (the newer one, doesn't matter) of the two
and you go back to the forum, and back to the index

should the forum still show up as unread because there is still an unread thread in there?

Currently it does show as unread and for that, it needs to go through the list of threads and check for each one whether its read or not.

The alternative to that would be skipping that entirely and just marking the forum as read because you did visit the forum and did see the list of new threads. So the list of threads is read, the threads themselves are not.

Again, in a large forum those semantics don't matter at all because there will always be a new post and when there is a new post it won't matter whether you decided to mark the forum read or not, it will be unread either way then.

You can return random 0 or 1 in that function and you probably wouldn't notice the difference in HF.

The ones who do notice is the small forums, where it's considered nice to be pointed at the still unread threads, by the forum markers.

EDIT:

A different solution that would work without (extra) queries would be marking the forum read or unread depending whether there are unread threads on the first page of the forum. This information is available in forumdisplay.php anyway (as it displays the first page of threads, with unread threads highlighted), so it could just count those there and update based on that. And unread threads on page 2 we just brush under the carpet.

That'd be the compromise that keeps the current semantics (more or less) but doesn't affect performance of larger forums at all.

EDIT2:

Naturally that'd only work for forumdisplay, you'd still need to query the [first page of] threads when actually reading a thread.
We agree then that this could be changed.

Quote:if you have a forum showing up as unread
and you go into the forum and there are 2 unread threads
and you read one (the newer one, doesn't matter) of the two
and you go back to the forum, and back to the index

should the forum still show up as unread because there is still an unread thread in there?

IMHO the answer is YES. It's a forumread marker not a threadread marker. If you clicked into forumdisplay then it should be marked as read.

This one single function has to be rewritten to return a true or false with the least possible query overhead.

How it is now is not acceptable.
Pages: 1 2