MyBB Community Forums

Full Version: Create a specific user group for spiders/bots
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello,

I have been trying to improve what content can search engines can crawl or not

On Admin CP - Spiders / Bots it says:
  • User Group
  • Select the user group permissions will be applied from for this board (Note: It is not recommended you change this from the default Guests group)

The guest group and I am happy with that for unregistered members, but I wanted to define what forums should be crawled
because we want to make some forums only available to senior members (and those will mean more pages to index)

Is creating a user group named "crawler" a wise decision???



Many thanks
Klaus
It's a good decision, just think you'll have to edit each spider to affect it the good group.
I think the reason to discourage this is because search engines expect to see the same content other users see. If a guest can see content that the search engine can't or vice versa then search engine won't like this.

I might be wrong.

It might be possible that you can achieve this by editing the robots.txt file?
Thanks Crazy Cat and Omar G. for answering!

Guests and Spiders/Bots will see what forums exist but won't be able to see their content and that generates many soft 404 pages on Google Search Console.

Using the robots.txt was a way but from what I have been learning some index your forum or site ignoring robots.txt.

Changing user groups one by one isn't an issue and thinking on when the forum increases size it will be easier to control what should or shouldn't be index, like creating additional forums.
That will also reduce many of the annoying soft 404's

One last question do you use the standard robots.txt configuration from the plugin?

This is how I am preparing mine

But I have seen different ones

# MyBB robots.txt
User-agent: *
Disallow: /captcha.php
Disallow: /editpost.php
Disallow: /modcp.php
Disallow: /moderation.php
Disallow: /newreply.php
Disallow: /newthread.php
Disallow: /printthread.php
Disallow: /private.php
Disallow: /ratethread.php
Disallow: /report.php
Disallow: /sendthread.php
Disallow: /task.php
Disallow: /usercp.php
Disallow: /usercp2.php
Disallow: /archive
Disallow: /online.php
Disallow: /calendar.php
Disallow: /reputation.php
Disallow: /search.php
Disallow: /*nextoldest*
Disallow: /*nextnewest*
Disallow: /*datecut*
Disallow: /*lastpost*
Disallow: /*markread*

Thanks again
I would advise against this as you will get done for cloaking. Googlebot does not always browse with a Googlebot user agent, sometimes it browses with the exact same user agent as a regular user meaning it's impossible to tell it's a crawler, and they'll use the regular unregistered group. If they then detect you are serving different content to Googlebot than a regular guest would see, you will be hammered in rankings, because a visitor won't see the same page when they click through. It's a good way to ensure your site never ranks for anything because they take a really dim view of it.
Short little personnal advice about robots.txt: most crawlers follow its directives, but not all, so don't base everything on it. And having a lot of rules in robots.txt can say a lot of things about your site, many pen-testers (I won't say hackers) first read robots.txt to see the structure of your site and determine which engine you use, where are restricted areas, ...