MyBB Community Forums

Full Version: Huge list of "Access Denied" errors on Google Webmaster Tools
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
We have over 1000 "access denied" errors on Google Webmaster Tools that appeared out of nowhere. When I look at the errors, the Google crawler is crawling stupid stuff like calendar events and crawling the option to "post a new reply" on threads despite I had initially disabled any posting or calendars in the Admin panel for bots.

Some of the stupid stuff the Google bot is crawling

ratethread.php
newreply.php
sendthread.php
calendar.php

Over 1000 errors like this!! All the important stuff (threads etc) is crawled perfectly, it's just this silly errors that Google should not be crawling in the first place. When I click on the errors of the URLs, it says that you're not authorized too enter, which is obvious as guests nor bots can create new replies!

What can I do? I believe I can block some pages in the robots.txt but I don't know if ".php" can be blocked. "Calendar.php" alone has 600 errors all URLs with different tails (end with different numbers) etc.

Please anyone?
Quote:I believe I can block some pages in the robots.txt
yes, you should use robots.txt
if you are using Google SEO plugin then it has a best robots file (robots.example.txt)
change /MYBB to /talk and rename as robots.txt so that it can work for your forum
can you please post contents of robots.txt here ..

i think there is some trouble with it..
(2013-05-28, 03:12 AM).m. Wrote: [ -> ]
Quote:I believe I can block some pages in the robots.txt
yes, you should use robots.txt
if you are using Google SEO plugin then it has a best robots file (robots.example.txt)
change /MYBB to /talk and rename as robots.txt so that it can work for your forum

Yes, I'm using Google SEO plugin, but I'm sorry, I don't understand what you're saying. I have a typical robots.txt I created myself and posted it in the root of the domain (www.example.com not www.example.com/talk/).

Where is this robots.txt from Google SEO plugin? And where do I post it? I would appreciate this as, if the robots.txt from Google SEO plugin already takes into account these 403 errors, then it'd help a lot!


(2013-05-28, 06:03 AM)remshad Wrote: [ -> ]can you please post contents of robots.txt here ..

i think there is some trouble with it..

It's a very simple robots text, i doubt it's causing problems as it is. I'm only blocking as of today Baidu as I find it is every day bothering me (Bing should learn from them, what a lazy spider Bing is)

This my robots.txt:

User-agent: *
Disallow:

User-agent: ia_archiver
Disallow: /

User-agent: Yandex
Disallow: /

User-agent: Baiduspider
Disallow: /

User-agent: Baiduspider-video
Disallow: /

User-agent: Baiduspider-image
Disallow: /
you can use below code for robots.txt at the homepage folder (http://www.example.com)

User-agent: ia_archiver
Disallow: /

User-agent: Yandex
Disallow: /

User-agent: Baiduspider
Disallow: /

User-agent: Baiduspider-video
Disallow: /

User-agent: Baiduspider-image
Disallow: / 

User-Agent: *
Disallow: /talk/captcha.php
Disallow: /talk/editpost.php
Disallow: /talk/misc.php
Disallow: /talk/modcp.php
Disallow: /talk/moderation.php
Disallow: /talk/newreply.php
Disallow: /talk/newthread.php
Disallow: /talk/online.php
Disallow: /talk/printthread.php
Disallow: /talk/private.php
Disallow: /talk/ratethread.php
Disallow: /talk/report.php
Disallow: /talk/reputation.php
Disallow: /talk/search.php
Disallow: /talk/sendthread.php
Disallow: /talk/task.php
Disallow: /talk/usercp.php
Disallow: /talk/usercp2.php
Disallow: /talk/calendar.php
Disallow: /talk/*action=emailuser*
Disallow: /talk/*action=nextnewest*
Disallow: /talk/*action=nextoldest*
Disallow: /talk/*year=*
Disallow: /talk/*action=weekview*
Disallow: /talk/*action=nextnewest*
Disallow: /talk/*action=nextoldest*
Disallow: /talk/*sort=*
Disallow: /talk/*order=*
Disallow: /talk/*mode=*
Disallow: /talk/*datecut=*
Allow: /
(2013-05-28, 08:41 AM).m. Wrote: [ -> ]you can use below for code for robots.txt at the homepage folder (http://www.example.com)

User-agent: ia_archiver
Disallow: /

User-agent: Yandex
Disallow: /

User-agent: Baiduspider
Disallow: /

User-agent: Baiduspider-video
Disallow: /

User-agent: Baiduspider-image
Disallow: / 

User-Agent: *
Disallow: /talk/captcha.php
Disallow: /talk/editpost.php
Disallow: /talk/misc.php
Disallow: /talk/modcp.php
Disallow: /talk/moderation.php
Disallow: /talk/newreply.php
Disallow: /talk/newthread.php
Disallow: /talk/online.php
Disallow: /talk/printthread.php
Disallow: /talk/private.php
Disallow: /talk/ratethread.php
Disallow: /talk/report.php
Disallow: /talk/reputation.php
Disallow: /talk/search.php
Disallow: /talk/sendthread.php
Disallow: /talk/task.php
Disallow: /talk/usercp.php
Disallow: /talk/usercp2.php
Disallow: /talk/calendar.php
Disallow: /talk/*action=emailuser*
Disallow: /talk/*action=nextnewest*
Disallow: /talk/*action=nextoldest*
Disallow: /talk/*year=*
Disallow: /talk/*action=weekview*
Disallow: /talk/*action=nextnewest*
Disallow: /talk/*action=nextoldest*
Disallow: /talk/*sort=*
Disallow: /talk/*order=*
Disallow: /talk/*mode=*
Disallow: /talk/*datecut=*
Allow: /

Thank you! Smile

I had actually seen this robots.txt template and was about to post it here to double check if it was correct. I'm glad it looks like it is then.

Can I ask, is the use of the code word "Allow" correct? Shouldn't it be "Disallow:"? I'm asking as I have read that Allow should never be used as not all bots understand that, and that Disallow is the word every robot understands.

Thanks again Smile
^ Allow: / can be removed if you doubt its usage. it is not compulsory (see Allow directive)
(2013-05-28, 09:52 AM).m. Wrote: [ -> ]^ Allow: / can be removed if you doubt its usage. it is not compulsory (see Allow directive)

Thanks you again!

I was just reading more on it and apparently using "Allow:" is useful for when you disallow a lot of folders and then want to ensure Google does crawl the rest of the site.

Only problem is that I'm reading some bots sometimes take Allow as Disallow, especially the older ones. And I'm reading rumors that Bing bot is as stupid as lazy it is, so it may interpret Allow as Disallow.

My question is: if I remove the Allow line, am I OK with Google to crawl everything except what is mentioned in the above Disallow lines? I'm thinking that yes, that'll be fine, but want to double check to finish now Big Grin

Also, I'm getting more and more 403 errors and even freaking 404 errors. I've read of other people with 7000+ errors which can impact your SEO. I have just seen that in the plugin folder it has a robots.txt example, for anyone having the same crawling errors, READ THE INSTRUCTIONS Big Grin I read all about installing the plugin but never actually went to read the other text files in the folder. Even still, I would have asked this question just in case, but I hope this issue of mine helps others in the future searching for answers.

Thanks '.m.' for your help. If I can double check with you what you think of removing the Allow line, that'd be awesome Smile

P.S: I will update and mark as solved if this fix works, which may take some days for Google to recrawl.
Hi, I'm new to this forum.
I'm having similar problems, I'm getting thousands of access denied errors from Google.
My site is www.americanflag.com

The robots : http://www.americanflag.com/robots.txt
User-agent: *
Crawl-delay: 10

According to webmaster tools they aren't being blocked by the robots.txt file. When I do a fetch as Google from the webmaster tools I get this :

Fetch as Google
This is how Googlebot fetched the page.
URL: http://www.americanflag.com/product-type...-flag.html
Date: Monday, October 7, 2013 at 6:13:40 PM PDT
Googlebot Type: Web
Download Time (in milliseconds): 35
HTTP/1.1 403 Forbidden
Date: Tue, 08 Oct 2013 01:35:18 GMT
Server: Apache
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 279
Keep-Alive: timeout=5
Connection: Keep-Alive
Content-Type: text/html; charset=iso-8859-1

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>403 Forbidden</title>
</head><body>
<h1>Forbidden</h1>
<p>You don't have permission to access /product-type/state-flags/nyl-glo-alabama-flag.html
on this server.</p>
<p>Additionally, a 403 Forbidden
error was encountered while trying to use an ErrorDocument to handle the request.</p>
</body></html>


Any help would be GREATLY Appreciated.