Dennis Forbes on Software and Technology   Subscribe to RSS


About the Author
Dennis Forbes Dennis Forbes is a Toronto-based software architect. While focused primarily on the .NET and SQL Server worlds, Dennis frequently ventures outside of this comfort zone into game development and image processing. He has been published in several industry magazines, has been quoted in the Wall Street Journal and has been interviewed by NPR.

He is a vice president and lead software architect at an innovative New York City hedge fund back-office services firm.

Dennis has been working on solutions for the financial, telecommunications, and power generation markets for over 15 years.




The Feed Bag

 
Friday, June 02 2006

Like most professionals in the technology field, I jump to Google and other search engines fairly frequently, in pursuit of hints and documentation to help with various technology dilemmas. A quick search on the web and the archives of newsgroups usually saves a tremendous amount of documentation diving and experimentation.

In return for the huge benefit that other people's documented successes, failures, and experiences bring, it has long motivated me to "pay it forward" by posting technology information online myself, hoping to help some future information seekers (on a similar vein, whenever I get a worthwhile answer on the newsgroups, I usually make it a habit of hanging around and answering several questions myself, returning the favour to the community). If the search logs are to be believed, over the years quite a few people have found pertinent information here regarding their software development problems or questions.

By the River

Lately I've been noticing a decreasing utility-versus-search time ratio, however, with quality declining largely as a result of a growing number of high-pagerank sites feeding cloaked/phantom pages to the Google search engine. Google sees a question/response that correlates closely with what the information seeker is asking, yet a visit by a real user (rather than a search engine spider) quickly finds only the question, with the answer suppressed until the user a) goes through an irritating, arduous process to sign up as a member on yet another infrequently visited edge site that'll likely sell their email address and bombard them with endless ads, b) signs up for some sort of pay membership. Given that many of these sites are simply siphoning their content from the Usenet or other forums, I'm never going to bother with either option, instead hitting back and following the next link, often to find the same sort of nonsense.

Somehow a small number of these phantom page sites, most of them seemingly linked to by no one legitimate, have taken over the top rankings on Google for a huge range of technical searches. Somehow they haven't been banned by Google yet, despite the fact that cloaked pages are expressly forbidden (if the search engine sees the answer, then any random visitor should immediately see the answer by following the search link, as the search engine hint implies that the immediate page contains the result).

If they feel justified in forcing registration to read often coopted content, or the right to charge a membership, I have absolutely no problem with that -- in fact I think the net would be a better place if there were more commercial opportunities encouraging even more intellectual investments. However they shouldn't fraudulently mislead search engines, and search users, and instead should rely upon normal advertising and word-of-excellence for their great utility. Otherwise they should fold, joining the heap of useless websites that could only fool users into visiting.

Don't waste our bloody time! Google shouldn't be acting by implicit complicity in these irritating schemes.

Speaking of the problem of apparent phantom pages, today I happened to be looking for CodeSmith, a free (albeit crippleware for as long as I can remember, and not freeware as the author continually claims) code generation tool that I had fond memories of several years back. Naturally I begin the search with codesmith freeware.

codesmith search

Great, so the page in question apparently talks about the freeware version of CodeSmith. Only it doesn't, and the text in question doesn't appear anywhere on the linked page. Go ahead -- look at that page and search the source for freeware.

In reality the obsolete and deprecated freeware/crippleware version exists on a totally different page (one which doesn't seem to be referenced anywhere else on the CodeSmith page). So why is my time being wasted with the first, desperate-to-turn-a-sale page? Why is Google entirely misleading me about the contents of said page? This sort of bait-and-switch has to stop.

Completely offtopic, but forcing people to register and anxiously wait for a download key to download a crippled, time-limited version is enormously irritating. It really pushes my patience when I just want to validate a product, almost certain of it turning to a multi-license sale, and finding that I'm forced to go through some B.S. that will inevitably yield annoying sales emails and followups.

Just let me download the demo, and if it's good I'll buy. If it isn't, you don't deserve the right to harrass me with promotions and petitions.

Reader Comments

Just dropping a line to give a massive thank you for your article on hierarchical searches. I have a small app that I use to help kids with hypertension and diabetes. My main job is biostatistics, so I am not an IT professional by any stretch. Your paper solved a problem I never would have handled otherwise (I have "SQL for smarties"). Your generosity of code sharing is great. If you ever need any biostatistical help, just let me know.

John
john @ 6/4/2006 8:21:27 PM
It's a pity that people will go to such annoying lengths to make what I can only imagine is a very small amount of money. On one hand, the search engines need to shape up. On the other hand, I think there will always be dirtbags finding new ways to cheat their useless content up to the top rankings - at least occasionally.

Is there any social page ranking out there? It would be great if a nerd could see in his search results that 12 other nerds found that forum post useful before he clicks it. Nerds wouldn't mind the extra work involved with clicking thumbs-up or thumbs-down on technical information, just like they don't mind blogging useful hacks and the like.
Greg Whitescarver @ 6/5/2006 10:04:01 AM
Click on the "Cached" version of the top codesmith result on google.

http://66.102.7.104/search?q=cache:JZe_Q6ud2K8J:www.codesmithtools.com/+codesmith+freeware&hl=en&gl=us&ct=clnk&cd=1

Towards the top, it explains about how the page relates to the search terms:<blockquote>
These search terms have been highlighted: codesmith

These terms only appear in links pointing to this page: freeware
</blockquote>
Eric Hammond @ 6/5/2006 3:49:25 PM
A very good point, Mr. Hammond, and completely correct. My primary contention, however, is the synopsis/snippet of the article that Google is feeding me. It's one thing for them to contextually find based not only on content, but also on those pointing at it, but it's quite another to give a completely misleading summary. I don't even know where they got that summary -- it isn't even on the cached version they have -- but it was enough to get me to follow the link.
Dennis Forbes @ 6/6/2006 8:16:18 AM
The text ("A freeware template-based code generator that can generate code for any ") is a DMOZ.org description of the site: http://search.dmoz.org/cgi-bin/search?search=codesmith

Google shows such descriptions for those sites lucky enough to be in DMOZ, seems a rather reasonable decision.

Can't comment on software in question, but IMO Google did reasonably good job here.
AlexC @ 7/12/2006 12:56:48 PM

Add Comment

Name *:

Email Address:

(your email address is not displayed)
Website:

Comment *:


Dennis Forbes