MyBB Community Forums

Full Version: [Performance] Stats Page and Todays Top Poster
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
So I was alerted today that my stats page took about 30 seconds to load. Sure enough the stats page did take forever and upon debug I saw that it was the "todays top poster" stats on stats.php.

Checking it out I saw the infamous GROUP BY statement.

Here is original query

	default:
		$query = $db->query("
			SELECT u.uid, u.username, COUNT(*) AS poststoday
			FROM ".TABLE_PREFIX."posts p
			LEFT JOIN ".TABLE_PREFIX."users u ON (p.uid=u.uid)
			WHERE p.dateline > $timesearch
			GROUP BY p.uid ORDER BY poststoday DESC
			LIMIT 1
		");
}
$user = $db->fetch_array($query);

So what this does is join users table, grabs the uid and pids from posts then groups them by uid and finally orders by the count...with the limit 1 it will return todays top poster.

However even with limit 1 it's going to do a complete table search as well as a join. Then the GROUP BY statement will place it all in a temporary table. My 4Gb posts table is of course not going to be optimal for this query to run. So I began to tear it apart looking for a solution. This is my end result:

	default:
		$query = $db->simple_select("posts","pid,uid","dateline > {$timesearch}");
}

// Custom
while ($user = $db->fetch_array($query))
{
	$count[] = $user[uid];
}

$test = array_count_values($count);
arsort($test);
$user = get_user(key($test));
$user['poststoday'] = reset($test);
// Custom

And while I'm nearly positive my hack isn't optimally written. I do know that I'm getting the same results back in a much faster query (Query Time: 0.0951268672943). I do however do one additional query with get_user() but that's very quick as well (Query Time: 0.00208878517151). Both queries use indexes.

What my hack does is use PHP instead of MySQL in order to parse results. For small forums with less than 100k posts and only on the stats page this one query is not a problem. But as posts table grows you do not want to see table locks while this query runs slowing your whole site down.

I'm curious if any other big boarders want to try out my hack and compare the before and after of their stats.php debug.

Here are comparisons with SF and HF.

SF Before:Generated in 0.1038280 seconds (49.24% PHP / 50.76% MySQL)
SF After: Generated in 0.1028390 seconds (50.13% PHP / 49.87% MySQL)

HF Before: Generated in 27.9128520 seconds (0.34% PHP / 99.66% MySQL)
HF After: Generated in 0.6591330 seconds (27.88% PHP / 72.12% MySQL)

So end results are well worth it and are not detrimental for large or small forums but beneficial for large forums. MYBB should avoid GROUP BY on large tables at all costs.

Also a benefit of my change is the pgsql fix would not be needed as $db->simple_select works fine.
I just tried this, it's noticeably faster, shaved off some loading time for sure.

Thanks labrocca
Can you post before and after stats in the same format I did? Also how many posts are in your forum? Might help with showing mybb performance difference and give us a permanent core file change. It's actually less lines of code for my change.
#11 - Select Query
SELECT u.uid, u.username, COUNT(*) AS poststoday FROM mybb_posts p LEFT JOIN mybb_users u ON (p.uid=u.uid) WHERE p.dateline > 1296245939 GROUP BY p.uid ORDER BY poststoday DESC LIMIT 1
table 	type 	possible_keys 	key 	key_len 	ref 	rows 	Extra
p 	range 	dateline 	dateline 	8 		16 	Using where; Using temporary; Using filesort
u 	eq_ref 	PRIMARY 	PRIMARY 	4 	forum.p.uid 	1 	
Query Time: 0.000230073928833

Well, there weren't any posts today - so at least the p.dateline condition seems to be doing the right thing after all.

Much more expensive queries on the same page:
#8 - Select Query
SELECT regdate FROM mybb_users ORDER BY regdate LIMIT 1
table 	type 	possible_keys 	key 	key_len 	ref 	rows 	Extra
mybb_users 	ALL 					100001 	Using filesort
Query Time: 0.482570171356
#12 - Select Query
SELECT COUNT(*) AS count FROM mybb_users WHERE postnum > 0
table 	type 	possible_keys 	key 	key_len 	ref 	rows 	Extra
mybb_users 	ALL 					100001 	Using where
Query Time: 0.454200983047

Values might not be representative as it's only a test forum with generated content.

Posts: 10,000,000
Threads: 1,000,000
Members: 100,001

IMHO the stats page is a candidate for getting generated once or twice a day and otherwise cached.
The other queries are not a problem. The problem with the first is that it doesn't scale. I get 25,000 posts per day. It joins, groups, and then sorts. It's the GROUP BY that's a problem since it takes the data and places into temp table.

Try to change the dateline to something that will grab 25,000 posts. An 18GB dual quad core server with MySQL 5.5 should not take 30 seconds for a query. If it does then the query is a problem.

@frostschutz can you try my fix and see what the difference is on your test forum?
Okay I have another edit which imho is cleaner as it doesn't do the extra query for get_user. It's actually a slower query than my original though.

	default:
		$query = $db->query("
			SELECT u.uid, u.username, COUNT(p.pid) AS poststoday
			FROM ".TABLE_PREFIX."posts p
			LEFT JOIN ".TABLE_PREFIX."users u ON (p.uid=u.uid)
			WHERE p.dateline > $timesearch
			GROUP BY p.uid ORDER BY NULL DESC
		");
}
while($user = $db->fetch_array($query))
{
	$topposter[] = array('poststoday' =>$user['poststoday'],'uid' => $user['uid'], 'username' => $user['username'] );
}

$user = max($topposter);

I sort of like my original though as it's the faster version and it uses $db->simple_select().

HF AFTER NEW CHANGE: Generated in 0.9097099 seconds (25.50% PHP / 74.50% MySQL)
How fast is selecting 25000 posts for you with just dateline > x condition, if that results in 25k posts? That query alone is already dead slow for me (in a table with 10 million posts), because it does not even use the index for that query, for some reason. Going through the entire table then of course takes ages...
Do you have an index on dateline? Because I do. And I have 8.5 million posts in table. If you don't have a dateline index then add one.

Selecting 25k posts is no big deal. It's the GROUP BY where it places it into a temp table then sorts it again. GROUP BY X ORDER BY NULL appears to be very fast but of course then I have to sort the results with PHP which seems to work fine as well. It's either a 28 second query using MySQL to sort or a 1 second query sorting with PHP.

And are you MySQL 5.5 with InnoDB yet?
Yes, it just doesn't actually use it for that query. No, plain old MySQL 5.1.whatnot.

It's a sad story when PHP is faster at sorting & counting than the database itself.
Quote:It's a sad story when PHP is faster at sorting & counting than the database itself.

I do think this is a fault in MySQL 5.5 with InnoDB. But it's also because of the massive table sizes I have. My posts table is 5GB. For todays top poster the query shows it gets 50,000 rows. If you search for "slow mysql group by" you'll see similar tales. Issue is when Group By and Order By are used together. Each by themselves is perfectly fine.

http://www.mysqlperformanceblog.com/2007...optimized/

He says it best imho.

This is relevant too: http://www.mysqlperformanceblog.com/2008...e-results/
It seems that it uses the index only for small ranges. Same query, just with a different dateline value, one time it uses the key and one it does not. Databases work in mysterious ways sometimes... Smile

EXPLAIN SELECT *
FROM `mybb_posts`
WHERE dateline >1290861373 

id 	select_type 	table 	type 	possible_keys 	key 	key_len 	ref 	rows 	Extra
1 	SIMPLE 	mybb_posts 	ALL 	dateline 	NULL 	NULL 	NULL 	10000000 	Using where

EXPLAIN SELECT *
FROM `mybb_posts`
WHERE dateline >1291861373

id 	select_type 	table 	type 	possible_keys 	key 	key_len 	ref 	rows 	Extra
1 	SIMPLE 	mybb_posts 	range 	dateline 	dateline 	8 	NULL 	2071924 	Using where

Even with the key, it looks at way too many rows... I think this is partly my test forum at fault though. The timestamps aren't what you would find in a natural forum...
Pages: 1 2