MyBB Community Forums

Full Version: Googlebot DDOS!!
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Googlebot tried to DDOS my site in the past week, using over over 200 GB of bandwith!! Can I limit it to not browse my board so much?? It failed to take the site offline but it still used a huge amount of bandwidth!
http://www.robotstxt.org/

Disallow it access to stuff it doesn't really need to browse.
Err, err... here's what my robots.txt file has looked like for a while now and googlebot still wants to crawl da reply pages, and I see them in some search results and that makes me think it's ignoring some rules!!!

Sitemap: http://vgchat.info/sitemap-index.xml

User-Agent: *
Disallow: /captcha.php
Disallow: /editpost.php
Disallow: /misc.php
Disallow: /modcp.php
Disallow: /moderation.php
Disallow: /newreply.php
Disallow: /newthrad.php
Disallow: /online.php
Disallow: /printthread.php
Disallow: /private.php
Disallow: /ratethread.php
Disallow: /report.php
Disallow: /reputation.php
Disallow: /sendthread.php
Disallow: /task.php
Disallow: /usercp.php
Disallow: /usercp2.php
Disallow: /calendar.php
Disallow: /*action=emailuser*
Disallow: /*action=nextnewest*
Disallow: /*action=nextoldest*
Disallow: /*action=emailuser*
Disallow: /*year=*
Disallow: /*action=weekview*
Disallow: /*action=nextnewest*
Disallow: /*action=nextoldest*
Disallow: /*sort=*
Disallow: /*order=*
Disallow: /*mode=*

Er according to the online list activity the google adsense bot grabs a thread and then the regular googlebot goes after that same thread within seconds!! [Image: EEK.GIF][Image: EEK.GIF]

according to awstats here's its visits

Googlebot	194425+260	2506.29 MB	13 Aug 2009 - 17:13
In the Google Webmaster Tools, you can set a custom crawl rate for the GoogleBot.

Quote:Googlebot 194425+260 2506.29 MB 13 Aug 2009 - 17:13

Well, as Googlebot browses your site, naturally it also causes transfer... but that's wanted bandwidth really as it helps making your site more popular and easier to find.

If you see a lot of bandwidth being wasted on someone, check that you don't have huge files / attachments (which you can block with robots.txt) or (in case of PHP) buggy PHP code that produces endless amounts of output (which you should remove / fix). Google is usually the first thing that hits on such issues because Google is the only thing that blindly follows every link it can find and keep trying to download it - whereas a normal user would at some point press the stop button Wink
Well if it's going to keep indexing reply pages regardless, is there any way I can optimize it?

http://74.125.93.132/search?q=cache:fqWp...ceweasel-a
the cache you linked is dated July 23rd 2009, so it's almost a month old. maybe your robots.txt was not in place back then? are there any more recent newreply.php pages?

Your robots.txt is missing an "Allow: /" in the last line, maybe that's causing some sort of issue?

Google Webmaster Tools lets you test URLs against your robots.txt, check if it says disallowed for any newreply link.