Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
robots.txt for MyBB
#21
The term "disallow" has been used in this thread and that’s what my point was about. I.E. Not all crawlers obey robots.txt: ”However, a robots.txt is not enforceable, and some spammers and other troublemakers may ignore it.” That statement also refers to some crawlers.

One way to block a page from being indexed (by googlebot) is to put proper meta tag in the head: “To entirely prevent a page's contents from being listed in the Google web index even if other sites link to it, use a noindex meta tag or x-robots-tag. As long as Googlebot fetches the page, it will see the noindex meta tag and prevent that page from showing up in the web index. The x-robots-tag HTTP header is particularly useful if you wish to limit indexing of non-HTML files like graphics or other kinds of documents.”

But don’t listen to me, do whatever floats your boat, I’m outta here. Sad
What’s popular isn’t always right and what’s right isn’t always popular.
Reply
#22
(06-09-2012, 09:27 AM)CAwesome Wrote:  It needs to be in the domain root, but you can simply change 'Disallow: /sendthread.php' to 'Disallow: /forums/sendthread.php' (etc)

Okay, that makes sense.Smile

Now could I use more than one sitemap (cause of the entry page, plus two different script it's impossible to use just one) and link both in the file? I know I can just submit them at Google Webmaster Tools for Google, but what about other sites?
I fold for team 52482. Do you fold?
Reply
#23
(06-10-2012, 01:07 AM)Puppyite Wrote:  But don’t listen to me, do whatever floats your boat, I’m outta here. Sad

...What? We did listen, you just made a general point that wasn't correct in all cases. Many crawlers do obey robots.txt. Bots might not, but all reputable search engines will. If they didn't, webmasters around the world would be talking about them.
Reply
#24
(06-10-2012, 02:54 AM)MadComp Wrote:  Now could I use more than one sitemap (cause of the entry page, plus two different script it's impossible to use just one) and link both in the file?

Most of search engines can read such files. But anyway - you can simply add all your sitemaps in robots.txt:

Sitemap: http://yoursite/sitemap1.xml
Sitemap: http://yoursite/forum/sitemap2.xml
Sitemap: http://yoursite/whatever/sitemap999.xml
etc...
Reply
#25
(06-10-2012, 08:35 AM)Maj Wrote:  
(06-10-2012, 02:54 AM)MadComp Wrote:  Now could I use more than one sitemap (cause of the entry page, plus two different script it's impossible to use just one) and link both in the file?

Most of search engines can read such files. But anyway - you can simply add all your sitemaps in robots.txt:

Sitemap: http://yoursite/sitemap1.xml
Sitemap: http://yoursite/forum/sitemap2.xml
Sitemap: http://yoursite/whatever/sitemap999.xml
etc...

Thanks! Makes my life a lot less complicated! Smile
I fold for team 52482. Do you fold?
Reply
#26
Question about robots.txt

If this is my file located in the root directory:

Quote:User-Agent: *
Disallow: /captcha.php
Disallow: /editpost.php
Disallow: /memberlist.php
Disallow: /editpost.php
Disallow: /modcp.php
Disallow: /moderation.php
Disallow: /newreply.php
Disallow: /newthread.php
Disallow: /online.php
Disallow: /printthread.php
Disallow: /private.php
Disallow: /ratethread.php
Disallow: /report.php
Disallow: /reputation.php
Disallow: /sendthread.php
Disallow: /task.php
Disallow: /usercp.php
Disallow: /usercp2.php
Disallow: /calendar.php
Disallow: /*action=emailuser*
Disallow: /*action=nextnewest*
Disallow: /*action=nextoldest*
Disallow: /*year=*
Disallow: /*action=weekview*
Disallow: /*action=nextnewest*
Disallow: /*action=nextoldest*
Disallow: /*sort=*
Disallow: /*order=*
Disallow: /*mode=*
Disallow: /*datecut=*
Allow: /

and my forum is in a subdirectory will it still work? My forum is in http://www.mysite.com/forum/

Do I need to add /forum in front of all those?
Reply
#27
I don't know, but I think it's better to add /forum.
My robots.txt here and it's works perfectly.

I'm also suggest to use google seo plugin.
Reply
#28
[quote='PsuedoK' pid='993441' dateline='1365847514']
Question about robots.txt

If this is my file located in the root directory:

Quote:Do I need to add /forum in front of all those?

Yes u need /forums/

My Robots
[Image: 468x60_bf4brasil_site_zpsd7eefd7f.png]
Reply
#29
Thanks for the tips.

Currently this the robots.txt I am using:
Code:
User-agent: *

Disallow: /attachment.php
Disallow: /calendar.php
Disallow: /captcha.php
Disallow: /editpost.php
Disallow: /member.php?action=emailuser
Disallow: /member.php?action=login
Disallow: /member.php?action=logout
Disallow: /member.php?action=lostpw
Disallow: /member.php?action=register
Disallow: /memberlist.php
Disallow: /misc.php?action=markread
Disallow: /modcp.php
Disallow: /moderation.php
Disallow: /newreply.php
Disallow: /newthread.php
Disallow: /online.php
Disallow: /printthread.php
Disallow: /private.php
Disallow: /ratethread.php
Disallow: /report.php
Disallow: /reputation.php
Disallow: /search.php
Disallow: /sendthread.php
Disallow: /showteam.php
Disallow: /stats.php
Disallow: /task.php
Disallow: /usercp.php
Disallow: /usercp2.php

PS: I've added an entry to disallow attachment.php to stop Google from trying to download attachments.
-- NewEraCracker
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)