Two tips to avoid Duplicate Content: Robots.txt or Meta Robots WordPress Plugin

November 5th, 2007

Do you use tags? Did you know they can bash your Google Page rank? But you can fix that?

Reading Graywolf’s blog, I was reminded to watch out for duplicate content issues and WordPress. It turns out that the wordpress default doesn’t nofollow “tags”.

Because bloggers who tag posts tend to create zillions of tags, they often end up with exactly one post in a many individual “/tag/” directories. This nearly always create duplicate content, which is not a good thing.

You’ll want to fix this; it’s fairly easy. I fixed the issue by modifying my robots.txt file.

What’s a Robot.txt file?

The robots.txt file is a plain text file you place in your root directory. It tells robots not to crawl specific files thereby eliminating the duplicate content issue.

The robot.txt file for BigBucksBlogger now reads like this: Read the rest of this entry »

WordPress Vulnerability: Take a little time to check.

November 1st, 2007

Seo Egghead has evidently discovered a WP 2.3.1 vulnerability HTML-tainting attacks. (The vulnerability evidently exists in W.P 2.1). The apparent application is to inject ads into bloggers older posts; these would tend to look like paid links. The problems for you would be a potential drop in page rank.

SEO Egghead recommends bloggers check their posts for insserted links to mp3 sites he has discovered at his site, and provides a plugin for this purpose.

I may be wrong, but I think you need to use his plugin. You should be able to get the same information by clicking “manage” in your dashboard, finding the big “search box” and entering ‘adshelper’. Then, click search. WP will return a list of posts containing links to “adshelper”. Next repeat the search for ‘softicana’. If both searches return zero pages, you’re clean.

While your at it: why assume these are the only hacker-advertisers? Take a little time and search for words like “mp3″, “casino”, “mortgage”, “viagra” and anything else you can dream up. If you find anything, blog about it so other bloggers can learn and check.

With luck, if my suggested method of testing useless, and you really do need to use the plugin, Seo Egghead will pop in and tell us I’m wrong. (I asked at his blog last night, and I’ll keep checking for an answer .)

Are you wondering how I did?
I seem to be ‘clean’ on both ‘adshelper’, ‘softicana’ and a variety of other terms I dreamed up.

Hmmm… Plugin idea
If these sorts of HTML tainting attacks are common, I should probably write a plugin that periodically scans all blog posts for a standard set of blacklist terms, plus terms in the users own blacklist. Monthly checks at all our blogs would let us catch these things and warn others. It would be an easy plugin… hmmm….

If readers do run this test, and any come up “tainted”, I’ll seriously consider writing that plugin. Meanwhile, I need to get through updating all my existing ones first!

Two Lessons About Search: What I learned by ranking #2 for “PageRank Zero October 2007″!

October 31st, 2007

Do you check your referrers? I do. I even try to learn things about search from my referrers. Today, I learned two thing when I investigated why I had a high rank for the Google search PageRank Zero October 2007.

What did I learn? First, no matter what else happened during the PageRank dust up, Google still likes older pages. Second, we should all give some attention to our archives.

Now, a bit of background. When I was my highrank for PageRank Zero October 2007 I thought three things:

  1. Google relevancy on this search term is not so hot.
  2. Who ranks #1 for “PageRank Zero”? and most importantly.
  3. Archives matter.

These thought led to a bit of investigation, from which I “learned” a thing or two. Below, I’ll expand on these thoughts, and provide the lessons they taught me about Google search.

Why do I think Google’s relevancy for this search is not so hot?

Two reasons.

  1. The #2 result was the top page of my monthly archive. The top page of my October archives were relevant for this search several days ago when they matched the current Google cache. That shows text from Ten Google Page Rank Haikus. which matches the topic of that search rather well.

    Today? There is no mention of “PageRank Zero” on that url.

  2. The #1 result for that particular long tailed search is Courtney Tuttle’s Going From PageRank Zero To PageRank Hero. (I’m condomizing the link in my never ending effort to seize the #1 position for totally useless search terms!)

    Sound relevant, right? The problem? Whoever was searching for “PageRank Zero October 2007″ likely wished to read articles about the “Google Page Rank Debacle of October ’07″. Courtney’s post was published in April; his October article would have been relevant.

Lesson:
Before proceeding, it’s worth noticing something: My October Archives page is older than my haiku page. It also has a direct link from my main blog page. Court’s April 2007 article is older still and it’s older than his October article.

Google still seems to like older pages.

Guess what ranks ranks #1 for “PageRank Zero”?

Read the rest of this entry »