MyBB Community Forums

Full Version: Excluding junk from Google Index
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
(2019-11-10, 11:35 PM)Ashley1 Wrote: [ -> ]That's not junk. It's actually correct for how MyBB SEF URLs work, and i have checked your index.

If you want to improve the duplicate content situation then you can install Google SEO plugin.

Further to that you should avoid making changes because you can set your SEO back. You don't need robots.txt. Robots.txt is generally only useful for specifying a sitemap and crawl budget optimisation.
Re the Google SEO plugin - I thought about it.  But I recall reading some warnings about it.

Re robots.txt.  If I had uploaded it before the site was crawled, I could have prevented all manner of junk from being indexed.

I see that Google has indexed reputation reports, user profiles and all manner of stuff.   This is not content, there is no way that is improving SEO.

Anyway, I will look again at the Google SEO plugin.  Thanks for your help.
Every site is different. For some sites, user profiles may be content.

But again, if you don't see it as content then simply add noindex to the member_profile template and the problem is solved.
(2019-11-10, 11:49 PM)Ashley1 Wrote: [ -> ]Every site is different. For some sites, user profiles may be content.

But again, if you don't see it as content then simply add noindex to the member_profile template and the problem is solved.
Not to be pedantic, but we are arguing at cross purposes here.  I am referring to what Google sees as "content".  User profiles would not fit into that category.  Hence I referred to anything that is not content or is duplicate as "junk".

A user profile is not going to answer a query that someone types into Google.  It does not fulfil the requirements of EAT (a big part of the new updated Google algorithm). Etc etc.
Not arguing, but if you think that to be the case i will stop contributing to your thread right here.

I have had a look at your site and index, and clearly no SEO planning went into it before publishing the site, Having said that, what you have now is not a train wreck, and adjustments can be made to improve your SEO position.

I don't think you know the difference between duplicate content and "junk". Most amateur webmasters don't. So actually your interests are best served by leaving these things you don't understand alone - or employing a professional.
(2019-11-11, 12:01 PM)Ashley1 Wrote: [ -> ]Not arguing, but if you think that to be the case i will stop contributing to your thread right here.

I have had a look at your site and index, and clearly no SEO planning went into it before publishing the site, Having said that, what you have now is not a train wreck, and adjustments can be made to improve your SEO position.

I don't think you know the difference between duplicate content and "junk". Most amateur webmasters don't. So actually your interests are best served by leaving these things you don't understand alone - or employing a professional.

Sounds good to me.  And ditto.  I don't think you know what you are talking about either.
From Google's FAQs:

Quote:If I block Google from crawling a page using a robots.txt disallow directive, will it disappear from search results?

Blocking Google from crawling a page is likely to remove the page from Google's index.

However, robots.txt Disallow does not guarantee that a page will not appear in results: Google may still decide, based on external information such as incoming links, that it is relevant. If you wish to explicitly block a page from being indexed, you should instead use the noindex robots meta tag or X-Robots-Tag HTTP header. In this case, you should not disallow the page in robots.txt, because the page must be crawled in order for the tag to be seen and obeyed.

How long will it take for changes in my robots.txt file to affect my search results?

First, the cache of the robots.txt file must be refreshed (we generally cache the contents for up to one day). Even after finding the change, crawling and indexing is a complicated process that can sometimes take quite some time for individual URLs, so it's impossible to give an exact timeline. Also, keep in mind that even if your robots.txt file is disallowing access to a URL, that URL may remain visible in search results despite that fact that we can't crawl it. If you wish to expedite removal of the pages you've blocked from Google, please submit a removal request via Google Search Console.

A robots.txt file and a robots noindex meta tag have the same ultimate outcome of the pages not being indexed, but the robots.txt will stop them being crawled at all. According to the above, you can change the robots.txt, and Google may come back and re-index the page and remove from the index if you add the noindex meta tag, but it says it's not guaranteed. Ultimately, the best way to force a removal would be through Search Console (I don't know if this allows removal via URL pattern to do all the URLs of a particular type at once).

A possible option to deal with the individual post URLs would be to set a canonical on the post pages to point to the originating thread, but I'd probably go with the approach the Google SEO plugin takes on that.
(2019-11-11, 12:25 PM)Matt Wrote: [ -> ]From Google's FAQs:


... Ultimately, the best way to force a removal would be through Search Console (I don't know if this allows removal via URL pattern to do all the URLs of a particular type at once).

It's called the removal tool, but does not remove the URL, it only hides it for 30 days, after which time the URL is returned to the index, unless Google drops it after seeing a noindex directive, or a 301 redirect.

And this method is only for Google, what about Bing and the other SEs.
Pages: 1 2