MyBB Community Forums

Full Version: Robots.txt and sitemap question
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
Hi there,

My website is setup with 2 wordpress blogs, one in the root directory www.mydomain.com and one at www.mydomain.com/de

My MyBB forum is at www.mydomain.com/forum and I do have a sitemap http://www.mydomain.com/forum/sitemap-index.xml
which contains only forum related stuff.

Now, I want search engines to fully crawl my blogs but not my entire forum. That's why I created a robots.txt.
If I put the robots.txt in my website's root folder www.mydomain.com/robots.txt, will the sitemap make bots only search the forum on not my blogs anymore (because of the sitemap)?
How about uploading the robots.txt to www.mydomain.com/forum/robots.txt
I want to make sure, that the bots search my blogs but not my entire forum.

Any help is greatly appreciated. Thanks inadvance.

Here is the content of my robots.txt btw.

Quote:Sitemap: http://www.mydomain.com/forum/sitemap-index.xml



User-Agent: *

Disallow: /forum/captcha.php

Disallow: /MyBB/editpost.php

Disallow: /MyBB/misc.php

Disallow: /MyBB/modcp.php

Disallow: /MyBB/moderation.php

Disallow: /MyBB/newreply.php

Disallow: /MyBB/newthread.php

Disallow: /MyBB/online.php

Disallow: /MyBB/printthread.php

Disallow: /MyBB/private.php

Disallow: /MyBB/ratethread.php

Disallow: /MyBB/report.php

Disallow: /MyBB/reputation.php

Disallow: /MyBB/search.php

Disallow: /MyBB/sendthread.php

Disallow: /MyBB/task.php

Disallow: /MyBB/usercp.php

Disallow: /MyBB/usercp2.php

Disallow: /MyBB/calendar.php

Disallow: /MyBB/*action=emailuser*

Disallow: /MyBB/*action=nextnewest*

Disallow: /MyBB/*action=nextoldest*

Disallow: /MyBB/*year=*

Disallow: /MyBB/*action=weekview*

Disallow: /MyBB/*action=nextnewest*

Disallow: /MyBB/*action=nextoldest*

Disallow: /MyBB/*sort=*

Disallow: /MyBB/*order=*

Disallow: /MyBB/*mode=*
Disallow: /MyBB/*datecut=*


Allow: /
If it's /forum/, you should replace /MyBB/ with /forum/ there. You should probably also get rid of the extra newlines.

Sitemaps are not exclusive - you can have several of them, and anything that's not included in them may still be found by crawling, as long as it's not disallowed by robots.txt. So having a sitemap for your forum won't affect how your blog is being indexed.
(2010-04-25, 12:03 AM)frostschutz Wrote: [ -> ]If it's /forum/, you should replace /MyBB/ with /forum/ there. You should probably also get rid of the extra newlines.

Sitemaps are not exclusive - you can have several of them, and anything that's not included in them may still be found by crawling, as long as it's not disallowed by robots.txt. So having a sitemap for your forum won't affect how your blog is being indexed.

Thanks! Should I upload the robots.txt to www.mydomain.com/robots.txt or to www.mydomain.com/forum/robots.txt ?
forum/robots.txt wouldn't work, it has to be in the root folder of the domain
(2010-04-25, 12:36 AM)frostschutz Wrote: [ -> ]forum/robots.txt wouldn't work, it has to be in the root folder of the domain

Thanks
(2010-04-25, 12:36 AM)frostschutz Wrote: [ -> ]forum/robots.txt wouldn't work, it has to be in the root folder of the domain
That's not true. You can still have it in the forum directory and not have an issue.
Sure you won't have an issue, but it also won't do anything. Search engines won't even look for a robots.txt in a /forum/ subfolder because the standard explicitely specifies that it has to be in the root folder of a domain.

http://en.wikipedia.org/wiki/Robots_exclusion_standard Wrote:If a site owner wishes to give instructions to web robots he must place a text file called robots.txt to the root of the web site hierarchy (e.g. http://www.example.com/robots.txt).
So does that count if you're on a subdomain of a site?
Like my site below?
Every subdomain needs its own robots.txt if that's what you're asking.

i.e. domain.com/forum/robots.txt is wrong
but forum.domain.com/robots.txt is right
So in other words if I have mybb in public_html then I should load the robots.txt to that folder. Here is a public tool to check it your robots.txt
http://tool.motoricerca.info/robots-checker.phtml

Also your format is incorrect with the extra spaces.. and it should be like this for example :

/forum/captcha/

Just thought I would help a bit Shy
Pages: 1 2