Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Not Solved Excluding junk from Google Index
#1
Not Solved
I've just uploaded a robots.txt file to the root of mybb forum.  The problem is there is already a load of junk and dupe stuff indexed in Google.

Do any of you know if this is gradually dropped out of the index after you block using robots.txt? I'm talking about posts instead of threads, user profiles etc.

Here is the robots.txt file.  Looks right to you?

User-agent: *
Disallow: /archive/
Disallow: /private.php*
Disallow: /usercp.php
Disallow: /usercp2.php
Disallow: /ratethread.php
Disallow: /newreply.php
Disallow: /memberlist.php
Disallow: /printthread.php
Disallow: /forumdisplay.php
Disallow: /showthread.php
Disallow: /member.php
Disallow: /calendar.php
Disallow: /newthread.php
Disallow: /editpost.php
Disallow: /moderation.php
Disallow: /report.php
Disallow: /reputation.php
Disallow: /sendthread.php
Disallow: /task.php
Disallow: /stats.php
Disallow: /showteam.php
Disallow: /showratings.php
Disallow: /syndication.php
Disallow: /thread-*-lastpost.html
Disallow: /thread-*-nextnewest.html
Disallow: /thread-*-nextoldest.html
Disallow: /thread-*-newpost.html
Disallow: /thread-*-post-*.html
Disallow: /post-*.html
Disallow: /forum-*.html?datecut=9999
Reply
#2
Not Solved
If you want to remove content from the index then blocking with robots.txt won't achieve that - the opposite will happen, and the content will stay in the index forever.

The only way to remove content is to let Google crawl it and set a meta noindex tag.
What goes around comes around
Reply
#3
Not Solved
(11-10-2019, 10:36 PM)Ashley1 Wrote: If you want to remove content from the index then blocking with robots.txt won't achieve that - the opposite will happen, and the content will stay in the index forever.

The only way to remove content is to let Google crawl it and set a meta noindex tag.

OK fair point.  But blocking using robots.txt prevents further crap from being indexed which is a larger issue in my view.

How would you go about adding a meta no index tag on the stuff you don't want indexed once it's already been indexed?
Reply
#4
Not Solved
You can't set a meta noindex and have a block in robots.txt. Google or any other search engine will never be able to crawl the URL and therefore get to see the noindex directive.

The other issue you have is separating your posts from threads.

Most people make this mistake with robots.txt when they are trying to cleanup their index, but you can take my advice and comment out those entries. Then use only meta noindex where you need to.

meta noindex can be set in the relevant page template in the head section.
What goes around comes around
Reply
#5
Not Solved
(11-10-2019, 10:59 PM)Ashley1 Wrote: You can't set a meta noindex and have a block in robots.txt. Google or any other search engine will never be able to crawl the URL and therefore get to see the noindex directive.

The other issue you have is separating your posts from threads.

Most people make this mistake with robots.txt when they are trying to cleanup their index, but you can take my advice and comment out those entries. Then use only meta noindex where you need to.

meta noindex can be set in the relevant page template in the head section.
Thanks for your help Ashley.  


Quote:but you can take my advice and comment out those entries. Then use only meta noindex where you need to.


I don't suppose you could advise how to do this? Where?  Which bit of code??

Quote:meta noindex can be set in the relevant page template in the head section.

Which one is relevant? I have set the forum to use search engine friendly URLS.  So thread1.html etc

So what should I be setting to meta no index??
Reply
#6
Not Solved
If you wanted to remove the memberlist.php page from the index, then you would go the memberlist template and paste this

<meta name="robots" content="noindex">

above

{$headerinclude}

That's all you need to do to remove memberlist pages and prevent new memberlist pages from being indexed.

If you wanted to remove the archive pages from the index, then you have to make a core code edit in inc/functions_archive.php. Bearing in mind that archive pages in the index can actually drive traffic to your site.

As for posts/threads i can't comment without seeing the site and/index contents.
What goes around comes around
Reply
#7
Not Solved
(11-10-2019, 11:18 PM)Ashley1 Wrote: If you wanted to remove the memberlist.php page from the index, then you would go the memberlist template and paste this

<meta name="robots" content="noindex">

above

{$headerinclude}

That's all you need to do to remove memberlist pages and prevent new memberlist pages from being indexed.

If you wanted to remove the archive pages from the index, then you have to make a core code edit in inc/functions_archive.php. Bearing in mind that archive pages in the index can actually drive traffic to your site.

As for posts/threads i can't comment without seeing the site and/index contents.
It's posts/threads that is bugging me the most.

Site is here : https://www.stevejabbaforum.com
Reply
#8
Not Solved
And what is an example of a junk URL in your opinion ?
What goes around comes around
Reply
#9
Not Solved
(11-10-2019, 11:24 PM)Ashley1 Wrote: And what is an example of a junk URL in your opinion ?
Anything that is not content or is a duplicate of content.
From what I can tell, posts are duplicates of threads and there is therefore no reason to be indexed.
I would consider everything else to be junk, with the obvious exception of the homepage.
Reply
#10
Not Solved
That's not junk. It's actually correct for how MyBB SEF URLs work, and i have checked your index.

If you want to improve the duplicate content situation then you can install Google SEO plugin.

Further to that you should avoid making changes because you can set your SEO back. You don't need robots.txt. Robots.txt is generally only useful for specifying a sitemap and crawl budget optimisation.
What goes around comes around
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)