2011-01-28, 10:24 PM
So I was alerted today that my stats page took about 30 seconds to load. Sure enough the stats page did take forever and upon debug I saw that it was the "todays top poster" stats on stats.php.
Checking it out I saw the infamous GROUP BY statement.
Here is original query
So what this does is join users table, grabs the uid and pids from posts then groups them by uid and finally orders by the count...with the limit 1 it will return todays top poster.
However even with limit 1 it's going to do a complete table search as well as a join. Then the GROUP BY statement will place it all in a temporary table. My 4Gb posts table is of course not going to be optimal for this query to run. So I began to tear it apart looking for a solution. This is my end result:
And while I'm nearly positive my hack isn't optimally written. I do know that I'm getting the same results back in a much faster query (Query Time: 0.0951268672943). I do however do one additional query with get_user() but that's very quick as well (Query Time: 0.00208878517151). Both queries use indexes.
What my hack does is use PHP instead of MySQL in order to parse results. For small forums with less than 100k posts and only on the stats page this one query is not a problem. But as posts table grows you do not want to see table locks while this query runs slowing your whole site down.
I'm curious if any other big boarders want to try out my hack and compare the before and after of their stats.php debug.
Here are comparisons with SF and HF.
SF Before:Generated in 0.1038280 seconds (49.24% PHP / 50.76% MySQL)
SF After: Generated in 0.1028390 seconds (50.13% PHP / 49.87% MySQL)
HF Before: Generated in 27.9128520 seconds (0.34% PHP / 99.66% MySQL)
HF After: Generated in 0.6591330 seconds (27.88% PHP / 72.12% MySQL)
So end results are well worth it and are not detrimental for large or small forums but beneficial for large forums. MYBB should avoid GROUP BY on large tables at all costs.
Also a benefit of my change is the pgsql fix would not be needed as $db->simple_select works fine.
Checking it out I saw the infamous GROUP BY statement.
Here is original query
default:
$query = $db->query("
SELECT u.uid, u.username, COUNT(*) AS poststoday
FROM ".TABLE_PREFIX."posts p
LEFT JOIN ".TABLE_PREFIX."users u ON (p.uid=u.uid)
WHERE p.dateline > $timesearch
GROUP BY p.uid ORDER BY poststoday DESC
LIMIT 1
");
}
$user = $db->fetch_array($query);
So what this does is join users table, grabs the uid and pids from posts then groups them by uid and finally orders by the count...with the limit 1 it will return todays top poster.
However even with limit 1 it's going to do a complete table search as well as a join. Then the GROUP BY statement will place it all in a temporary table. My 4Gb posts table is of course not going to be optimal for this query to run. So I began to tear it apart looking for a solution. This is my end result:
default:
$query = $db->simple_select("posts","pid,uid","dateline > {$timesearch}");
}
// Custom
while ($user = $db->fetch_array($query))
{
$count[] = $user[uid];
}
$test = array_count_values($count);
arsort($test);
$user = get_user(key($test));
$user['poststoday'] = reset($test);
// Custom
And while I'm nearly positive my hack isn't optimally written. I do know that I'm getting the same results back in a much faster query (Query Time: 0.0951268672943). I do however do one additional query with get_user() but that's very quick as well (Query Time: 0.00208878517151). Both queries use indexes.
What my hack does is use PHP instead of MySQL in order to parse results. For small forums with less than 100k posts and only on the stats page this one query is not a problem. But as posts table grows you do not want to see table locks while this query runs slowing your whole site down.
I'm curious if any other big boarders want to try out my hack and compare the before and after of their stats.php debug.
Here are comparisons with SF and HF.
SF Before:Generated in 0.1038280 seconds (49.24% PHP / 50.76% MySQL)
SF After: Generated in 0.1028390 seconds (50.13% PHP / 49.87% MySQL)
HF Before: Generated in 27.9128520 seconds (0.34% PHP / 99.66% MySQL)
HF After: Generated in 0.6591330 seconds (27.88% PHP / 72.12% MySQL)
So end results are well worth it and are not detrimental for large or small forums but beneficial for large forums. MYBB should avoid GROUP BY on large tables at all costs.
Also a benefit of my change is the pgsql fix would not be needed as $db->simple_select works fine.