Dear Jason Calacanis,
I was visiting your blog today, and I noticed that Advertising Age today and noticed Mark Simon criticized Mahalo. Evidently, he’s can’t believe that human powered search can work. He specifically says:
6. “HUMAN-POWERED” SEARCH ENGINES.
The reason search engines are much better places to find information than directories is because they leverage automation to do the grunt work that human editors used to do.
Well, Jason, I can’t help but agree. As I’ve pointed out, the guides you have working on Mahalo are having some difficulty applying the style guide, catching dead links, and catching sneaky redirects that appear after the page is published. Still, I know you plan to persist in the insane endeavor you call “Mahalo”.
As long as you are, I’d like to make a few suggestions. I think you will recognize the underlying goal of each: Let computers and ‘bot do what they do well, let humans do what they do well!
So, here goes:
- Create ‘bots to count ads:
Problem: Currently, your guides evidently need to sift through hundreds and hundreds of ad laden pages to find the ones that aren’t full of ads. This is a costly waste of time. Their time would be better spent comparing the less ad-intensive sites and deciding which is best.Solution: Write a ‘bot that permits the user to enter a Google search. After the Google search is run, let the ‘bot load the content of each page, identifies each (using the same method available to the FireFox extension AdBlockerPlus) and count the number of ads. Then return a link results page that includes only pages with limited numbers of ads.
To catch cloaked affiliate ads, you may need to extend the ‘bot to follow all links and watch whether the link passes through a redirect. Flag those as suspicious.
- Create ‘bots to discover redirects on Mahalo links:Problem: Aggressive marketers will identify Mahalo pages, and persuade the page owner to allow them to redirect the page to a substitute page. Josiah Cole notes this happened to Mahalo’s Jesus in Food page.
Solution: Store the content of original pages. Write a ‘bot that loads each link on Mahalo. Record the url of the final destination page. If the url has changed, have this noted in a database. Send humans to check that page promptly.
- Create ‘bots to notice changes in content.Problem: Links die. This has already happened. I identified five several dead-ish links at new sources on your Change page. They weren’t entirely dead, but the story had moved.
Solution: Cache the source of all pages linked on Mahalo. Write a ‘bot that compare the current version of pages to old pages using the sorts of algorithms used by plagiarism checkers. (Andy Beard described a Iplagiarism.) When the content of the new site deviates substantially from the old content, send a human to check the page.
- Create online rubrics to ensure style guides are followed.Problem: Those styles guides are long. Sometimes they are ambiguous. There is some inconsistency in applying the rules.
Solution: Have the guides fill out a checklist covering the top ten spammy features only humans can find. No content above the fold? Check a box. Found contact email? Check. Phone? Check. That sort of thing. If the page has to many danger flags, the rubric will catch it.
- Create ‘bots that check for spam words:Problem: Yes, searches for “Jason Calacanis” point to links about lesbians making out.
Solution: When that ‘bot I suggested above loads all the Google search results, have it compare the content to a blacklist of spam terms. Lesbians is in my SpamKarma blacklist- put it in your Mahalo blacklist. If the ‘bot finds a bunch of porn terms, you can be pretty sure it’s a porn site. The guides don’t need to waste their time checking it!
- Send me a huge box of luxury yarn. Or maybe season tickets to the Lyric Opera. Or something. Ok. This won’t help you at all. But you keep joking you should put me on the payroll.
What you really should do is find someone whose first reaction to criticism of Mahalo is to assume there is a problem, and who then says: “If this is a problem, how could we fix it?”
‘Bot can help this insane idea scale.
Heck, if you write the correct ‘bot to help the humans, there is a chance you can actually get Mahalo to work within some reasonable budget. Let’s face it, you can hope the public will report your dead links. But in reality, you’ll find more often “the public” will be very eager to suggest their own sites. (For example: my knitting blog is the best blog and belongs on Mahalo. Period. )
Heck, with luck, you’ll find these ‘bots will help you explain why my knitting blog is not the best one (even though it is.)
P.S. I thought explaining how the guides can use javascript to quickly find the duplicate content I told you about before was just the right solution. Not too complicated, implemented quickly. I still think Mahalo isn’t any good yet, but I can now see why they pay you the big bucks.
P.S.S. To add to the list- there’s no reason you can’t write the bot to do the plagiarism checks before the human ever loads the page!
Mahalo for all the great ideas… keep them coming!
We certainly have bots in the works, just like DMOZ had back in the day. The bots will check for things like dead links, changed pages, and redirects. You can expect to see the bots in action in the next month or so.
With regard to checking the number of ads a bots not going to be able to do that to well, and to be honest it’s not really that major of an issue for us. We like to really check out the pages we link to… so, if there are too many ads we know and we put a warning up. If we make a mistake the cost is very low (i.e. “ouch! a person had to see a couple of extra ads” or “ouch! someone missed a site because we thought it was too ad heavy”). No biggie. Also, if this happens folks will report it to us.
In terms of following the templates we’re creating let me say a couple of things:
a) we’re 100 days old.
b) the templates are
continued….
b) the templates are