MyBB Community Forums

Full Version: support for third party SEO
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
I'm currently in the middle of developing a SEO plugin for MyBB, called Google SEO plugin. The goal of this plugin is to implement the guidelines of the recently released Google's SEO starter guide.

Regarding the implementation of custom SEO URLs, I found that MyBB has a very nice API for creating links to the most important pages, in inc/functions.php:
function get_forum_link($fid, $page=0)
function get_thread_link($tid, $page=0, $action='')
function get_post_link($pid, $tid=0)
/* ...and so on and so forth... */

Naturally in order to implement custom SEO urls, my plugin hooks into these functions to gain support for SEO urls in all MyBB and even in third party plugins. Or so I thought. It only works when this API is actually used.

So far I found that MyBB itself uses its own API basically everywhere, except in two places that involve multipage().

In forumdisplay.php it's this part:
// Assemble page URL
if($mybb->input['sortby'] || $mybb->input['order'] || $mybb->input['datecut']) // Ugly URL
{
    $page_url = str_replace("{fid}", $fid, FORUM_URL_PAGED);

    if($mybb->settings['seourls'] == "yes" || ($mybb->settings['seourls'] == "auto" && $_SERVER['SEO_SUPPORT'] == 1))
    {
        $q = "?";
        $and = '';
    }
    else
    {
        $q = '';
        $and = "&";
    }

    if($sortby != "lastpost")
    {
        $page_url .= "{$q}{$and}sortby={$sortby}";
        $q = '';
        $and = "&";
    }

    if($sortordernow != "desc")
    {
        $page_url .= "{$q}{$and}order={$sortordernow}";
        $q = '';
        $and = "&";
    }

    if($datecut > 0 && $datecut != 9999)
    {
        $page_url .= "{$q}{$and}datecut={$datecut}";
    }
}
else
{
    $page_url = str_replace("{fid}", $fid, FORUM_URL_PAGED);
}
$multipage = multipage($threadcount, $perpage, $page, $page_url);

In showthread.php it's this part:
        // Work out if we have terms to highlight
        $highlight = "";
        if($mybb->input['highlight'])
        {
            if($mybb->settings['seourls'] == "yes" || ($mybb->settings['seourls'] == "auto" && $_SERVER['SEO_SUPPORT'] == 1))
            {
                $highlight = "?highlight=".urlencode($mybb->input['highlight']);
            }
            else
            {
                $highlight = "&highlight=".urlencode($mybb->input['highlight']);
            }
        }

        $multipage = multipage($postcount, $perpage, $page, str_replace("{tid}", $tid, THREAD_URL_PAGED.$highlight));

It would help me very much if the get_forum_link() / get_thread_link() functions were used instead of FORUM_URL_PAGED / THREAD_URL_PAGED in these two locations.

Change proposal for forumdisplay.php:
// Assemble page URL
$page_url = get_forum_link($fid, "{page}");

if($mybb->input['sortby'] || $mybb->input['order'] || $mybb->input['datecut']) // Ugly URL
{
    if(!strstr($page_url, "?"))
    {
        $q = "?";
        $and = '';
    }
    else
    {
        $q = '';
        $and = "&";
    }

    if($sortby != "lastpost")
    {
        $page_url .= "{$q}{$and}sortby={$sortby}";
        $q = '';
        $and = "&";
    }

    if($sortordernow != "desc")
    {
        $page_url .= "{$q}{$and}order={$sortordernow}";
        $q = '';
        $and = "&";
    }

    if($datecut > 0 && $datecut != 9999)
    {
        $page_url .= "{$q}{$and}datecut={$datecut}";
    }
}
$multipage = multipage($threadcount, $perpage, $page, $page_url);

Change proposal for showthread.php:
        // Work out if we have terms to highlight
        $page_url = get_thread_link($tid, "{page}");

        if($mybb->input['highlight'])
        {
            if(!strstr($page_url, "?"))
            {
                $page_url .= "?highlight=".urlencode($mybb->input['highlight']);
            }
            else
            {
                $page_url .= "&highlight=".urlencode($mybb->input['highlight']);
            }
        }

        $multipage = multipage($postcount, $perpage, $page, $page_url);

I'm aware that this is only a hack (page is supposed to be a number and not "{page}" when passed to get_*_link()). This is probably bad style and it prevents a third party SEO to do something interesting like replace page numbers with number words (thread-one-page-one and no I'm not doing that). For a better solution multipage() itself would have to be made aware of the get_*_link() API and call it once for every page link it produces, which may be both more complicated and expensive.

Thanks for your attention.
I think this would be a good idea, as well. Not just for frostschutz's plugin, but just in general.
I've actually written my own private SEO plugin, and I would discourage the use of those get_*_link functions actually. Unfortunately, there's no way to pull in a cache of everything that way (you're forced to query for each function call, which is very inefficient). Parsing the page generally works a bit better, although preg_* functions are slow-ish. (they'll also get around hard-coded URLs that users do in template edits, for example)

But anyway, back to your point, it seems to be a reasonable suggestion Toungue
Parsing the output is definitely not the way to go. It's too expensive. And preg_* is not a parser. And as for URLs hard coded by the user, I don't concern myself with them. The user has the same issues, when switching between built-in normal and SEO mode. What I already do is redirect pages, so even the "wrong" multipage links do end up in the correct location, but it would be nice to have the correct links where they are generated by MyBB in the first place.

Regarding performance, yes, I do have some queries in a get_*_link() call. However I also do cache the results of those queries. So if links to the same page is requested more than once in a page (which is almost always the case, e.g. the links to all posts and pages of a thread, or links to the thread and to the lastpost of a thread in the forum display, etc.), a query is done only for the first call, the subsequent calls are cached. There's room for improvement of this caching mechanism, and room for optimization in the plugin in general, but so far there have been no issues with performance whatsoever.

The functions are there and they are useful, so I use them. Besides with the parsing approach I would still have to run the very same queries I'm running in the get_*_link() functions, so there is nothing to be gained.
Found another place where the SEO links are not being used: The sorting links in forumdisplay pages.

For example the Link to the Suggestions and Feedback forum here is:
http://community.mybboard.net/forum-7.html

However the sorting links, for example sort by replies, is:
http://community.mybboard.net/forumdispl...order=desc

The difference to the multipage links is, where multipage() reimplements the get_*_link() API, the sorting links do not even try to produce their SEO counterpart URLs.

In this case it's actually an advantage... sorting URLs are an annoyance to Google since Google can't tell that it's the same content just with different sorting. So if you use SEO URLs, you can simply use robots.txt to block forumdisplay.php etc. completely. Without SEO URLs you can still use robots.txt but with more sophisticated rules that prevent usage of datecut, sortby, order parameters in the dynamic part.

So I'm kinda torn here, on one side I'd prefer if MyBB used the get_*_link() everywhere and not produce forumdisplay.php?fid= URLs when forum-x.html URLs are enabled, on the other hand in this case it makes blocking unwanted URLs much easier.
(2008-12-21, 11:40 PM)frostschutz Wrote: [ -> ]Regarding performance, yes, I do have some queries in a get_*_link() call. However I also do cache the results of those queries. So if links to the same page is requested more than once in a page (which is almost always the case, e.g. the links to all posts and pages of a thread, or links to the thread and to the lastpost of a thread in the forum display, etc.), a query is done only for the first call, the subsequent calls are cached. There's room for improvement of this caching mechanism, and room for optimization in the plugin in general, but so far there have been no issues with performance whatsoever.
The problem is you can't forward predict what links are going to appear.
On pages like forumdisplay, you could generate the URL from the title during the loop, so you actually don't need to query at all, however, this may not be true for all cases (can't think of any examples now). For example, if you couldn't do this on forumdisplay, a page with 50 threads would require 50 queries - every time someone decides to visit forumdisplay.
Performance may not be too horrible if the MySQL server is relatively free, and probably if it's installed on localhost, but querying external MySQL servers is a lot more costly, in terms of performance (thus why we generally try to keep the number of queries down).

Now that I think about it a little more, your approach probably isn't so bad. I'd probably still stick to replacing URLs which couldn't be replaced inline, at shutdown, to avoid a worst case of "over-querying" the DB.

(2008-12-21, 11:40 PM)frostschutz Wrote: [ -> ]The functions are there and they are useful, so I use them. Besides with the parsing approach I would still have to run the very same queries I'm running in the get_*_link() functions, so there is nothing to be gained.
You'd only have to run one query at most (SELECT * FROM ___ WHERE id IN (<id list>)) - as opposed to a worst case scenario, which could be 50 or more queries...


(2009-01-10, 11:27 PM)frostschutz Wrote: [ -> ]So I'm kinda torn here, on one side I'd prefer if MyBB used the get_*_link() everywhere and not produce forumdisplay.php?fid= URLs when forum-x.html URLs are enabled, on the other hand in this case it makes blocking unwanted URLs much easier.
Sorting URLs do get a little messy with complicated query strings...
But I agree that there probably should be more consistency regarding your first post, and the above post - either use the original .php links for both, or tack on query strings in both instances to the SEO'd URL.
(2009-01-11, 04:42 AM)Yumi Wrote: [ -> ]Performance may not be too horrible if the MySQL server is relatively free, and probably if it's installed on localhost, but querying external MySQL servers is a lot more costly, in terms of performance (thus why we generally try to keep the number of queries down).

I guess it would matter on big forums that get more than 1 hit per second on average - they'd probably do anything to reduce load, and definitely not install any plugins that add however little load. I'm not running such a forum. For me, 50 additional queries isn't an issue. A database is supposed to be able to deal with a lot more queries than that. As long as I see page generation times of <0.1 seconds I'm happy.

It could be reduced to a single query by hooking into forumdisplay etc. at the specific point where forumdisplay knows exactly which threads it is going to display. Also a more sophisticated caching mechanism could be implemented that catches way more queries at a memory usage tradeoff. However I'm not going that far as long as I'm not seeing any performance issues in my forum.

Quote:You'd only have to run one query at most (SELECT * FROM ___ WHERE id IN (<id list>)) - as opposed to a worst case scenario, which could be 50 or more queries...

Yes, but I don't get the <id list> for free either. I traded a couple of database queries for a lot of searching, parsing, and replacing strings on a possibly huge page. I just don't like that kind of solution. If I can get the id list for free, for example by hooking into forumdisplay.php at the exact point where it knows which threads it is going to display, it can be done, but most likely it would require another modification to the core MyBB files, and I want to keep the number of these modifications low.

If you have a very slow database it could pay off, but if you have a slow database you can shoot yourself anyway. Databases are supposed to be fast, and there shouldn't be an issue with a couple of very simple queries.

Thanks for your input on optimization, I really appreciate it. Smile
(2009-01-11, 09:57 AM)frostschutz Wrote: [ -> ]Yes, but I don't get the <id list> for free either.
A preg_match would capture the id list (you can capture the offsets as well so you don't have to run a second preg_replace if you consider it slow). You could also hook at places such as forumdisplay_thread and build the list of tids from there.

(2009-01-11, 09:57 AM)frostschutz Wrote: [ -> ]I traded a couple of database queries for a lot of searching, parsing, and replacing strings on a possibly huge page. I just don't like that kind of solution. If I can get the id list for free, for example by hooking into forumdisplay.php at the exact point where it knows which threads it is going to display, it can be done, but most likely it would require another modification to the core MyBB files, and I want to keep the number of these modifications low.
If that's the case, it's possible to overwrite the DB object and gain control over it. Eg:
$plugins->add_hook('forumdisplay_announcement', 'myfunc');
function myfunc() {
	global $db;
	eval('
		class myCustomDb extends '.get_class($db).'
		{
			function myCustomDb(&$olddb)
			{
				$a = get_object_vars($olddb);
				foreach($a as $k => $v)
					$this->$k = $v;
				$this->myplugin_hook_mode = false;
			}
			function query(<stick argument list here>)
			{
				static $done=false;
				if($this->myplugin_hook_mode)
					$this->myplugin_hook_mode = false;
				if(!$done) {
					$done = true;
					$this->myplugin_hook_mode = true;
				}
				return parent::query(<args>);
			}
			function fetch_array($query)
			{
				$a = parent::fetch_array($query);
				if($this->myplugin_hook_mode) {
					global $myplugin_tids;
					$myplugin_tids[] = $a[\'tid\'];
				}
				return $a;
			}
		}
	');
	
	$db = new myCustomDb($db);
}
^ Dirty, and potentially a little problematic if another plugin hooks there and queries (you can try use priorities to avoid that, and perhaps check the query string to see if it's the one you want), but should work in most cases.

I'll let you decide on whether you prefer querying over parsing. Otherwise, I can't really help you much more with your suggestion (I'm not the one to decide them).

Hope that helps.
forumdisplay_thread is called once for every thread when I only need the first call. It could be done by unregistering the hook in the first call. Strange approach but it could work. I'll experiment with it...
(2009-01-11, 11:01 AM)frostschutz Wrote: [ -> ]It could be done by unregistering the hook in the first call.

But then you'd break all the other plugins trying to use that hook. Just globalize a variable and check if it's 1 or 0 and just "return" if it's 1.

And 50 queries? Really? Something like that would increase the load on a large forum 2-5 times. It's not that the queries aren't fast, it's that every time you query a MyISAM table you lock it from other queries until it finishes that query. If you even get 1 single query that locks the table for too long on a forum with a high amount of users (say 1000 online) then apache starts to backup requests into RAM. Once RAM is out of memory, it overflows into the disk and your server is pretty much dead from there. That's essentially a DoS attack

That's why I've had to go to extreme measures on a large forum I help run to write a script that is run by a cron every minute that looks at any active queries, and checks how much time they're taking. If it's taking too long then it kills it and it logs it for future analysis by myself. From there I can make appropriate optimizations. I have already several optimizations coming for MyBB 1.4.5 because of the scripts I have implemented. MyBB 1.2.7 and MyBB 1.4 was also a major result of those efforts.
Pages: 1 2