How to Prevent Your Blog From Being Marked as “Duplicate Content”

Avoid Duplicate Content: it can ruin your ranking.

Obviously, you should never copy any kind of content from anywhere around the web.

I’m sure you know that

Google looks at duplicated content at a very bad way and can even give your website a penalty for that.

But hold on, do you use tags on your blog/website? Did you know they can bash your rankings?

duplicate-content
Avoid duplicate content at all cost.

You must always watch out for duplicate content issues and WordPress.

It turns out that the wordpress default doesn’t nofollow “tags”.

Because bloggers who tag posts tend to create zillions of tags, they often end up with exactly one post in a many individual “/tag/” directories. This nearly always create duplicate content, which is not a good thing.

You’ll want to fix this; it’s fairly easy. This should be fixed by modifying your robots.txt file.

What’s a Robot.txt file?
The robots.txt file is a plain text file you place in your root directory. It tells robots not to crawl specific files thereby eliminating the duplicate content issue.

The robot.txt file for BigBucksBlogger now reads like this:

User-agent: *
Disallow: /*.js
Disallow: /*.png
Disallow: /*trackback
Disallow: /*.css
Disallow: /*/feed/$
Disallow: /*/feed/rss/$
Disallow: /*/trackback/$
Disallow: /tag/
Disallow: /author/
Disallow: /comments/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Disallow: /wp-admin/
Disallow: /*?*
Disallow: /*?

I’m keeping robots out of my tag, author, comments and wp-admin directories, out of a number of subdirectories in wp-contents, and out of javascript, png, css, trackback files.

Because I don’t use the WordPress default permalinks, I’ve also blocked robots from crawling addresses with query strings. Do this only if you aren’t using query strings in your permalinks otherwise, you’ll block the ‘bots from your whole site. You really don’t want to do that. Permitting the bot to index duplicate content is bad; indexing no content is worse.

I could block the bots from other files, but there is generally no need to block the robots from any file that is never linked. Also, you never want to block robots from your index.php file, sitemap or ‘.php’ files in general.

Once you create the robot.txt file, just save it with the name robots.txt file and drop it in my root directory.

Should you block from categories?

I don’t block bots from my categories directory because I only post excerpts on those pages. So, I don’t worry too much about duplicate content on those pages. In fact, since I find the ‘bots often index those and bring me traffic, I want the ‘bots to crawl those. (I am planning to change things to post 10 excerpts per page; I think that will make it easier for people who look at categories to find what they want to find.)

If you run full articles in your categories, you may wish to block ‘bot from those. The same holds for all your archives. (I’m going to be modifying my template to show nothing but excerpts at all addresses except the front page and the individula posts. This should prevent a lot of duplicate content issues without the need to block ‘bots.)

Is there anything you can do to avoid duplicate content?
Sure. In fact, there is a great Meta Robots WordPress plugin available that lets you tailor no-follows on your blog. One of the options is adding nofollow metatags to the headers of files in your “tag” directories. This will eliminate the duplicate content penalty as well.

Using the plugin also permits you to precisely tailor link-juice flow around your blog. I’ll be using this soon and explaining many of my “nofollow” decisions after I install the plugin.