MyBB Community Forums

Full Version: Load spider break
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
https://github.com/mybb/mybb/blob/featur...ession.php
foreach($spiders as $spider)
{
	if(my_strpos(my_strtolower($this->useragent), my_strtolower($spider['useragent'])) !== false)
	{
 		$this->load_spider($spider['sid']);
	}
}

Need add "break;", otherwise it can load spider more than once.
I think this does not make a sense.

After a discussion with @noyle

A break may be needed for a better spider/bot parsing, but in real scenario it would be rare cases where multiple spiders/bots being loaded.
However, it depends.
For example, if a bot has UA IAMABOT and another bot has UA IAMBOTv2, and in your database you have only IAMBOT , the two bots will both be loaded.
OK, here's the fact.

If the forum owner needs a finder control of different spiders/bots from a same engine, current method of comparing UA string and creating session might result in multiple spider/bot sessions being created.

Take Google bots for example, from this help page for Google crawlers https://support.google.com/webmasters/an...1943?hl=en, there are various Google bots sharing some common strings in their UAs.
The UA identifier of Googlebot (desktop & mobile) stored in MyBB database is Googlebot (it's really this value by default) and the bot name is "Google".
If the forum wants to recognize a Google video/image/news bot, a UA identifier Googlebot-Video / Googlebot-Image / Googlebot-News should be added in the database with the bot name "Google Video" / "Google Image" / "Google News".
Then, if a Google video bot is visiting the forum, by current method, there would be two sessions created: one is the bot named "Google" and the other is "Google Video".

Back to OP's suggestion of adding a break;, it'll result in a "not correctly retrieved" spider/bot id that is stored in the database, if the UA identifiers are not carefully set. But without a break, multiple bot sessions might be created.

For a better spider/bot id retrieving result, the UAs should carefully set. The crawler-user-agents's https://github.com/monperrus/crawler-use...gents.json is a list of crawler UAs that will be correctly identified with best effort. For example, the Googlebot will be recognized by Googlebot/ (note the trailing slash)

By the way, what I've said that has been mentioned in @Eldenroot's post is not all correct. The truth is:
Quote:For example, if a bot has UA "IAMABOT" and another bot has UA "IAMBOTv2", and in your database you have both bots' entries, the two bots will both be loaded when only the bot IAMBOTv2 is visiting the forum.