Two tips to avoid Duplicate Content:
Robots.txt or Meta Robots WordPress Plugin
Robots.txt or Meta Robots WordPress Plugin
Do you use tags? Did you know they can bash your Google Page rank? But you can fix that?
Reading Graywolf’s blog, I was reminded to watch out for duplicate content issues and Wordpress. It turns out that the wordpress default doesn’t nofollow “tags”.
Because bloggers who tag posts tend to create zillions of tags, they often end up with exactly one post in a many individual “/tag/” directories. This nearly always create duplicate content, which is not a good thing.
You’ll want to fix this; it’s fairly easy. I fixed the issue by modifying my robots.txt file.
What’s a Robot.txt file?
The robots.txt file is a plain text file you place in your root directory. It tells robots not to crawl specific files thereby eliminating the duplicate content issue.
The robot.txt file for BigBucksBlogger now reads like this:
User-agent: *
Disallow: /*.js
Disallow: /*.png
Disallow: /*trackback
Disallow: /*.css
Disallow: /*/feed/$
Disallow: /*/feed/rss/$
Disallow: /*/trackback/$
Disallow: /tag/
Disallow: /author/
Disallow: /comments/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Disallow: /wp-admin/
Disallow: /*?*
Disallow: /*?
I’m keeping robots out of my tag, author, comments and wp-admin directories, out of a number of subdirectories in wp-contents, and out of javascript, png, css, trackback files.
Because I don’t use the Wordpress default permalinks, I’ve also blocked robots from crawling addresses with query strings. Do this only if you aren’t using query strings in your permalinks otherwise, you’ll block the ‘bots from your whole site. You really don’t want to do that. Permitting the bot to index duplicate content is bad; indexing no content is worse.
I could block the bots from other files, but there is generally no need to block the robots from any file that is never linked. Also, you never want to block robots from your index.php file, sitemap or ‘.php’ files in general.
Once you create the robot.txt file, just save it with the name robots.txt file and drop it in my root directory.
Should you block from categories?
I don’t block bots from my categories directory because I only post excerpts on those pages. So, I don’t worry too much about duplicate content on those pages. In fact, since I find the ‘bots often index those and bring me traffic, I want the ‘bots to crawl those. (I am planning to change things to post 10 excerpts per page; I think that will make it easier for people who look at categories to find what they want to find.)
If you run full articles in your categories, you may wish to block ‘bot from those. The same holds for all your archives. (I’m going to be modifying my template to show nothing but excerpts at all addresses except the front page and the individula posts. This should prevent a lot of duplicate content issues without the need to block ‘bots.)
Is there anything you can do to avoid duplicate content?
Sure. In fact, there is a great Meta Robots WordPress plugin available that lets you tailor no-follows on your blog. One of the options is adding nofollow metatags to the headers of files in your “tag” directories. This will eliminate the duplicate content penalty as well.
Using the plugin also permits you to precisely tailor link-juice flow around your blog. I’ll be using this soon and explaining many of my “nofollow” decisions after I install the plugin.
Tags:duplicate content plugins robots.txt robots txt seo tags WordPressRelated Posts:
- Blog Security: htaccess block
- Lucia's Linky Love for WP 2.3: Option to follow trackback immediately.
- Improve Your Better Feed: Wordpress Plugin
- Andy Beard Wants Dramatic Titles: Just Like Muhamed Saleem's.
Comments
15 Responses to “Two tips to avoid Duplicate Content: Robots.txt or Meta Robots WordPress Plugin”
Leave a Reply
Cheers for the tips Ill check out Graywolf too. Dont think I am affected, yet at least, but i guess its better to be safe than sorry
Lucia,
my robots.txt is quite simple:
User-agent: *
Disallow: */trackback*
Disallow: /wp-*
Disallow: */feed*
User-Agent: MediaPartners-Google
Allow: /
I’m no expert (I didn’t write it myself), but I believe you should add the last two lines if you use Adsense on any of the blocked pages. Otherwise you’re blocking the Adsense bot as well as the search bot and won’t get ads in context.
Also, note the /wp-* line. This should block everything that starts with wp, including the files (such wp-admin.php) and the folders (such as wp-include), etc. I can’t see any reason to include anything that starts with wp (unless you have a wp tag or category).
I don’t use tags yet and I leave blocking category and archive pages (I know you don’t want to block this) to the All In One SEO plugin. It adds noindex,follow to the html on the page itself rather than using robots.txt.
As I say, I’m no expert on this, but I thought I’d share my setup.
[…] http://money.bigbucksblogger.com/two-tips-to-avoid-duplicate-content-robotstxt-or-meta-robots-wordpr… Tags: duplicate content, google, noindex […]
Nice articles you got here, very usefull for every one who need some tips or references. Ishould put some of your articles on my website. Maybe we can share our articles someday. Let me know if i can share my article here. You can reach me at http://www.tipscollections.com. Thanks
I believe that when you disallow a folder in the robots.txt, this completely prevents the spiders from going to that certain directory, which prevents them from both indexing the page and following the links on the page. The meta tags above, allow for spiders to visit the page and follow all of the links, but not index the page.
Awesome article, updating my ,htaccess now thanks!
www.wildpussygirls.com
Thanks for this, just setting up a WP site for the first time and this was invaluable help.
[…] Two tips to avoid Duplicate Content, Robots.txt or Meta Robots WordPress Plugin. […]
what effect if not use them?
get bannded or what?
[…] Two tips to avoid Duplicate Content, Robots.txt or Meta Robots WordPress Plugin. […]
Great time saver this article! Thank you very much.
I was plagued with duplicate content from the tags on my site. They generated the same meta description over and over again. For my posts and pages, I use the All in one SEO plugin where you can set description etc..per item, but I have no controlover the tags as there is no page to manipulate. The robot.text will solve this issue.
Good… but how about blogger.com and their blogspots?
Good… but how about blogger.com and their blogspots?
Hi
I have a blog that have one post only it first ranked good for my fav kw but after being indexed it dropped to page 2 of google do you think this is due to dup. content coz the home page has the same copy as the only post’s page??!