Dennis Forbes on Pragmatic Software Development
Subscribe to RSS
 
Saturday, October 28 2006

Seth talks about the manipulation and gaming of blogs and meme sites, which I mention because the same topic -- opinion swarming & vote stuff, is one that I've written about previously, and I remain very interested in observing it in action. 

  Blogging 
Monday, September 18 2006

I started blogging on September 4th of last year.

apples I had an internet presence prior to that (content that received several Slashdot mentions, along with a half-decent number of inbound links), but I didn't put up content with the regularity of a blog -- largely as a function of the hassle involved -- and I didn't have RSS, Atom, or any other feed technology (and thus wasn't aggregated into other feeds).

It was just a random hodge-podge of random pages prior to putting it into the structured form you see today.

All-in-all the past year has been a very, very rewarding experience: A very credible number of people visit, and from a search-engine perspective the results have been extraordinarily successful. Strange seeing several dozen people a day from my hometown coming by just because I happened to mention it in a blog entry.

To quote from a September 4th entry-

The question I am pondering, then, is whether the only way one can remain internet credible (in search engine terms) is to integrate heavily within the blogging community, quid-pro-quoing endless links and trackbacks, ingratiating oneself with other bloggers, posting meaningless comments about every posting every other blogger makes (which they will of course do in turn). It's a sort of super-pyramid scheme, but with no bottom level.

Thankfully I've never had to quid-pro-quo or ingratiate to maintain PageRank. In fact I think I've maintained a fairly antagonistic approach to many of the popular blogs and bloggers, and I've seldom resorted to inventing "material" out of mentioning other blogs.

Which brings up an interesting topic - I was chatting with a peer about blogging and the effort/reward ratio, and they asked if I felt that I had "succeeded" in this venture: Sure, they pondered, I'd gotten a lot of mentions, along with a couple of heavily visited pages, but overall I still sit quite low on the list Playdough Flowersof bloggers. My Alexa rankings stink (though I should mention that Alexa rankings are laughably useless outside of the top internet sites. Alexa ratings are culled from users utilizing the Alexa or A9 toolbars, which is a vanishingly small number of users, clustered into certain demographics. Just a couple of users occasionally visiting with the toolbar has an absurdly large impact, so if I wanted to shoot up in the rankings, I'd just recommend the toolbar every month. As a case-in-point, at one point I noticed that my Alexa ranking had jumped considerably, but became suspicious that a disproportionate number of visitors visited the webstats page...which of course only I visit. I realized it was me that was inadvertently impacting the rankings when I had installed the A9 toolbar, so I removed it), and I'm not even among the top 1000 bloggers (by one metric I'm #5,269).

I have something like 118 bloglines subscribers, versus say 21,000 for someone like JoelOnSoftware (bloglines is only one of many aggregators, and Joel has far more subscribers overall, but it's a metric that is meaningful in a relative sense).

Yet I am thankful for every single reader, and the success of this blog is worlds beyond what I imagined. More important than quantity is quality, and some of the feedback leads me to believe that a great group of people have decided to drop by every now and then (even though many don't use feed readers, and just added it to their bookmarks for a once-a-month browse. That's the same technique I use for most blogs). Sure, complimenting your readers is a suspect activity, and is often driven by egotism above all else, but I really mean it: I couldn't have asked for a better readership.

And perhaps this will come off as cheap or like sour-grapes (which it most certainly isn't -- I set out expecting an occasionally accidental search visitor, and never anticipated the success this has seen), but there are some pretty easy ways I could have modified the message a bit to build and maintain a much greater blog presence, but that wasn't my goal.

I could...

  • Blog more frequently. As it is, with two pre-school children, it's ridiculously difficult finding time to get posts in, but sometimes I just have to get a thought out there so it's a great exercise.
  • Blog a more consistent message. No thanks. One of the things I love about this is that I can blog about a .NET video codec one day, Wikimedia on Windows the day after, HTML compression the next, SSIS packages the next, and my opinion about work environments after that.

    Occasionally I hestitate, wondering "will the people who subscribed after {X} hit the front-page on Reddit really care for this?" -- for instance when putting up navel-gazing entries like this -- but then I realize that's getting caught in the classic trap of limiting oneself to a narrow range of topics. With readers and browsers, people can just hop past things they aren't interested in, and while I'm sad for anyone who unsubscribes or /dev/nulls this blog, that's preferrable to diluting it to a serve-everyone-but-really-noone message.
  • Blog a more generalist message. I certainly don't mean to be elitist with some of the entries, but it's the nature of the beast that some of them aren't going to entertain or cater to generalists or technical tourists. Compare this to almost all of the top blogs in this space -- apart from a couple of very rare exceptions, most seldom venture outside of the realm of easy, accessible observations and pondering. The "colour of the shed" sort of entries that everyone can add their voice to the chorus with their opinion.
  • Blog towards retention. This is really achieved by keeping a general, accessible, non-threatening topic going, consistently pounding the same theme, or by creating controversy and debate where none really exist, but it's also built by minimizing the number of outbound links, and maximizing the number of internal links. This has never motivated the way I author posts, and if I lose a small percentage of users with each outbound link as they go off exploring Wikipedia or Seth Godin or anything else, that's something I'm very happy about. Having said that, the number of internal links seems to be increasing on here as of late -- as I've built a larger and larger volume of content, it just seems like I'm a pretty good resource to reference!

None of these techniques are secrets, but they're only acceptable modus operandi if your primary goal is, well, blogging. That isn't my primary goal by a long shot, and I have no ambitions of becoming a professional blogger. Instead I'm motived to talk to, and hopefully influence -- and maybe even impress -- intelligent and influential people.

In the coming few months (or more correctly weeks) I have several very, very exciting things that are going to come out, including the most exciting and innovative web idea I've ever had. It's only going to get better.

But I'll never compromise the message, and I'll never let metrics and stats give me misdirected motivation.

Tuesday, July 25 2006

The Wall Street Journal Mention

Last Wednesday I was mentioned in the Wall Street Journal (right there on the front of the second section of one of the world's most prestigious newspapers), being referred to as the "world's pre-eminent domainologist" (an article that has been referenced in countless other sources now, including some errant attributions, such as the Toronto Star -- my hometown paper -- seemingly making me a Verisign employee, which of course I'm not). 

Apparently -- or so my wife tells me, given that I don't listen to or read anything that involves me in any way, and even when she talks about this stuff I cover my ears and basically repeated "LaLaLa"s to drown it out -- it was a well-written, humorous piece. While I apparently played the part of a fringe, bit-player, my name does appear quite early in the article, and that's pretty neat to me.

The mention doesn't bring me monetary rewards, and it really doesn't contribute to my professional success in any measurable way (though it's very neat being mentioned, and it was a hugely fun process working with Lee to get the raw material and provide some basic quotes, I'm not really in the business of domain names, and it isn't really a hobby of mine -- being attributed in such a way isn't really something I really want to leverage), but it is yet another weird, discordant mention in mainstream media.

So long as it isn't the notorious sort of mention, somehow it all works into my grand plan of world domination. Bwahahahaha! <rubs eyebrows>

Origins

It all began with a couple of emails from Lee. He indicated that he was from the WSJ, and was interested in talking with me about an article he was considering. After some difficulty finding a common point of availability, we finally chatted in person. This was around Wednesday of the week before.

That evening Lee recorded an initial phone interview, indicating that he had come across my article from back in March, and knew that it had seen a lot of success (for those who didn't see it, it was an article that took off like wildfire across the net, seeing front page action on Digg, Reddit, and mentions from numerous `A-list' bloggers. Quite a few of the entries on here have seen wide "link-love", but the domain name entries absolutely blew all prior -- and following -- records away, seeing close to 100,000 visitors a day for a period of time, still maintaining a lot of incoming interest).

Given that he hadn't come across similar research (he did ask if I knew anyone else doing similar research, perhaps probing to see if I was just a sub-eminent domainologist, and perhaps I would defer to a great authority), he decided to base his article on information I provided, both in the initial article and numerous follow-up queries he asked me to run.

One particularly exhaustive query took around 20 hours of runtime.

All in all it was a lot of fun, and from my end was nothing more than a couple of very brief phone interviews, and then some randomly kicked off queries and emailed results.

What a WSJ Mention Gets You

Despite the fact that the article in question provides limited personally identifying information (and while it's accurate, it is a bit misleading for some. For instance I'm not in New York City -- I'm actually here in a suburb of Toronto -- and the article of course apparently doesn't mention this blog), the immediate effect of the article was dozens of phone calls from people across the US -- and the world -- asking for my opinions on business ideas, asking if domain names people held were good ones, asking if I was interested in partnering on some project or other, asking how to get access to the raw data (see the comments in the main entry -- there's a link to the fax forms), and asking how I ended up being referenced in a WSJ article.

This blog also saw a lot of activity because of the article, with a number of people coming here after searching up obvious terms like "Dennis Forbes domain name". I'm still seeing WSJ-related search activity today (maybe hermits are just adding the issue to their apartment newspaper mountains).

I've received requests for radio interviews (I've done a couple of those before, and it isn't my favourite genre: I'm too full of self-doubt when it comes to accuracy, and mortally fear the possibility of saying something incorrect in response to an adhoc question. In such an instance I'd rather say nothing until I can verify, with certainty, that what I'm saying is correct. I haven't been "blessed" with the arrogance and confidence that allows some to make the most absurd of proclamations with zero self-doubt or hesitation), and have gotten requests for, and responded to, several email interviews.

All in all a very entertaining process, and it was interesting to take part in it. It has me looking for my next angle for media exposure.

How To Become The World's Pre-Eminent Domainologist

Of course Lee was being facetious when he assigned me with this title, and really I found it gut-busting hilarious when I heard it myself.

The original domain name article actually came about because I needed a medium-sized database to demonstrate high-performance database operations. While I was indeed curious about domain names, ultimately I requested access purely to have a large set of data to demonstrate some index-backed operations. I was shocked when I discovered that one could actually acquire a copy of the zone file. 

I really haven't been poring over zone files for years, amazingly reading trends and consistencies from streams of raw data.

After receiving the data, I saw that it really was interesting and entertaining, so in a single night I threw together the original article: Right after getting my credentials from Verisign, I downloaded the 850MB compressed file, extracted, imported and cleansed it, and then ran some humorous queries to see if it yielded interesting results. Seeing some of the answers, I thought it would be good blog fodder so I tossed an article together and put it online.

Over the next week I only had a free moment here and there, so I belatedly put up a follow-up article, in my haste skipping many of the tests that I had promised (for instance the English language queries, which I only finally finished at Lee's request).

My interest was short-term, and my technique was mostly driven by the biggest bang-for-the-buck queries that would yield interesting blog material, while allowing me to save my time for my family and my profession (in that order). I wasn't really sitting there month after month anxiously watching streaming domain name data, inferring complex patterns like a savant. Instead it was a couple of low-hanging fruit queries against the imported dataset, writing up the results when it was unexpected or entertaining.

Of course then the material seemed to be exhausted (the follow-up article saw much less attention), and my personal curiousity waned. The database then sat and collected bitular rust.

It's a marvel that it didn't get deleted to free up room for my prime-number database, or my ridiculously expanding set of digital pictures.

Then Lee called, I fired up the database -- to my surprize I still had it -- and the rest is history.

On Making A Hugely Popular Blog Entry

Populism has seldom been a goal of these entries, but a couple of entries, not to mention observations of the meme sites, have given me some insights into what are some elements that increase the probability of an entry taking off. Let's just say that I'm the World's Pre-Eminent Meme Site Popularity Assessment Expert (WPEMSPA, aka Wimpy-Sumpa).

  • Obvious topic, and easy consumption - There is a general inverse correlation between number of words in an article/entry relative to its popularity on sites such as Digg and Reddit, not to mention blog references. This certainly isn't universal -- I've come across some incredible essays on those sites -- but the easiest pages to get widely linked include a relatively significant amount of graphics, or a very limited amount of text.

    Note the predominance of "top 10" style entries, where readers don't even have to read the summaries, instead deriving impressions based only on the list positions.

    Bloggers love to reference these sorts of articles because it saves time "RTFMing", allowing them to skim and post summaries of dozens of stories they've seen across the web without the hassle of actually reading them. Generally the only lengthy articles that get widely linked are those from sites such as the NY Times, or by long known industry figures such as Paul Graham, because people can skim and just assume the rest of the content, presuming that the author or publisher assures them that the rest must be half decent as well.
  • Everyone can relate - Almost everyone, including the non-technical, has contemplated the idea of creating the next .COM success story, jetting around in their own personal 737. Many have visited a registrar, desperately punching in combinations in hopes that by some amazing coincidence no one ever bothered registering cars.com and similar low-hanging domains. They heard the get rich quick stories, so they want to get rich. And quick!

That's about it. Create entries covering everyday topics, and populate it with easy to digest graphics, and summaries that give cursory linkubators comfort that they're linking something interesting.

Enjoy the endless incoming traffic!

Tuesday, May 23 2006

For close to two months now, I've been rather negligent of this blog. The reasons are numerous, however the following is a list of the primary causes.

  • My wife is back to work as a laboratory scientist, now that maternity leave is complete, so "free time" (if such a thing exists with two small children) is getting squeezed entirely out, and...
  • ...Professionally I've been extraordinarily busy, pursuing some new business avenues and opportunities, making it very difficult to allocate time to finishing articles-in-progress. 

    A partial motivation in maintaining a blog/original content system at the outset was to get some "cheap" (if the time dedicated to creating content was valueless) PR to drum up some consulting/software development customers, however that necessity has largely disappeared (and it was only intended as a fail-safe anyways. I never had to actively look for clients, instead relying upon business contacts and word of mouth. I've actually had to turn away most blog sourced  business due to a lack of capacity). Furthermore, as a PR vehicle for 360notes.com, I think the product itself will earn far more attention than any pimping in these entries ever would.
  • Lastly, but certainly not least, the incredible success of the DNS entries makes everything else almost seem anticlimactic.

    I remember when I first started posting online papers, getting giddy to see that a half a dozen people read them in a week (and I carefully did reverse-IPs to see where they came from, following every referral back to the source), which I knew by downloading and looking at the logs every 15 minutes. As time went on, however, and readership increased, the "dose" required to have any motivational effect inflated, such that having several thousand distinct viewers (e.g. 10,000+ "hits", however nebulous that metric is) in a given day starts to almost seem like a failure (I see newspaper articles gushing about whatever human interest blog of the day caught their eye, and it makes me cynical seeing that they only have 600,000 visitors in a month. "That's only 20,000 visitors a day!"). It's strangely discouraging to think that new efforts will yield only a small portion of the attention the disposable DNS entries did.

    I'm completely over the "hit craving" stage that most bloggers/original content producers go through, and almost entirely disregard the stats. From this perspective, and hoping that I can find a small amount of available time, I'm going to finish up some long-in-the-making articles, along with some other content that I've been wanting to explore. Through it all I promise to disregard the stats.

Thanks for reading along, and have a fantastic day and week ahead!

Dennis

Thursday, March 16 2006

Before I go about possibly reinventing the wheel, I thought it worthwhile to ask: Could anyone point me to .NET / Windows server modules for SXIP 2.0 and/or OpenID? They're both fairly trivial identity solutions, so if I can't find one I'll implement one or both. Not only for personal needs, but because I can see some uses for them in client projects.

Thank you kindly.

Friday, March 10 2006

If you believe the Alexa graph, Digg is a significantly more popular site than Reddit, with the gap growing larger every day.

I find these results surprizing. I've had half-a-dozen entries on the front page of Reddit, with each yielding from 2500-5600 distinct referrals per day. In comparison, I've had a front page entry on Digg for a day, only bringing in ~750 referrals.

Of course, a single entry isn't a very good sample, and it's entirely possible that most people just weren't interested in the link -- that it was a fluke that it got on the front page in the first place -- but I've seen several other stat watchers mention very similar stats (that a front-page of Digg yielded them 800 or so visitors per day), so I'm not basing this comment simply on my own observation. I've looked for exceptions to this, and found one individual who had a broad-interest link atop Digg's front page for 24 hours straight, and they claim that for that day they received a total of 7200 distinct referrals, and then it rapidly tailed off, disappearing in two days. That case seems to be the exception.

One possibility is that Digg offers more link diversity and thus the much greater traffic is dispersed, significantly reducing the impact on any one link. Alternately, perhaps Digg users spend more of their time within the Digg community, rather than following the links (in the same way that many Slashdot readers just make assumptions about the linked article, responding accordingly, rather than RTFA).

Another possibility is that Digg caters to a crowd that is more likely to have the Alexa/A9 toolbars installed, both of which feed back the stats that are used to drive the Alexa popularity metrics. Given that they're somewhat infrequently used toolbars, and are much more likely among certain crowds (and seems to appear in clusters), the traffic rankings are a bit of a crapshoot outside of the top sites -- Here on yafla I've had days with 6000 visitors where my Alexa ranking doesn't budge, whereas other days 2000 visitors cause it to quintuple.

  Blogging 
Monday, March 06 2006

One of the continuing trends of the Web 2.0 revolution is tag-mania -- sticking tags on everything and anything, hoping that it somehow improves the flow, digestion, and utility of information. From adding tag clouds to your blog, to slashdot, to photos, to bookmarks, tags have continued to spread across the web landscape.

Burlington Skyway

As with every tech "revolution", in corporations across the globe eager employees are embracing the trend, advocating adding tags to documents and directories and files, and embracing the concept of metadata.

As a bit of an explanation for those who haven't been following TechCrunch in morbid curiousity -- wondering what dubious business came out of super-secret stealth alpha invite-only mode today -- and thus aren't up on their Web 2.0 lingo, tags are, in essence, a set of words that one or more users apply to something to categorize it -- what we historically called keywords, albeit sometimes (thought not always) with a "democratic" process determining the rendered tag set.

For instance the tags of this post might be "Web 2.0, tags". Ten visitors might add "tripe", making it the dominant tag in the tag cloud.

Getting a variety of people adding tags to the same content, or building a common directory of information loosely categorized by tags, is what's commonly called a folksonomy. Consider, for comparison, a formal taxonomy of a system like Yahoo's classic categorization, where a submitter would choose exactly where in the hierarchy a link went, and the Yahoo overlords would validate it, and insert it if appropriate. Instead the loose addition of tags adapts to have multiple categorizations over time.

[Web 2.0 aware readers will probably shudder seeing an explanation of something so "basic", yet discussions in the field have led to me to believe that much of this great revolution has gone unnoticed by the bulk of society, including even the majority of technology workers. I regularly converse with people who've never seen del.icious, don't know who 37signals are, and haven't been to Reddit or Digg or Flickr or Furl. Much like bloggers have grossly overestimated the impact of blogs on the general population, there seems to be a presumption that the Web 2.0 lingo and dogma is more universal than it actually is]

While many of the Web 2.0 aficionados declare there to be a fundamental religious difference between the venerable keyword and tags, the difference is superficial at best (democratically selected keywords are still just keywords). The same keywords that have always existed as a data block in the JPEG file format, and exists in virtual every document format (Word, for instance), form the foundation of tags. Metadata has been around since we first started storing data, and tags are a continuation of that trend.

Many of the foundations of modern tagging, the evolution of the keyword, were first demonstrated widely by the superlative web photo organizing and sharing application Flickr.

Given the primitive state of image recognition, this was a perfect fit: Without tagging your photo with keywords such as "bridge, burlington skyway, qew", there was no way searches could find that photo if asked, for instance, for pictures of the Burlington Skyway bridge -- We aren't yet at a stage where software can reliable figure out what the subjects of a picture are, and mechanical metadata is still incomplete (although it's getting there), so keywords/tags/folksonomies fills a critical gap if the photography data process.

Outside of photos the use of tags is often much more dubious.

To go back in history a bit, when search engines first appeared they largely relied upon meta keywords. This was a compromise due to limits in the "comprehension" of content -- search engines got confused easily, and even when they could parse the content properly they couldn't truly figure out what the content was about

Keywords came along, offering a simple, condensed, human-created subset of the data, categorizing the important attributes of the content. Search engines embraced and utilized keywords as an important element of fulfilling search requests.

The honeymoon didn't last for long. It turned out that keywords were a prime stomping ground for search engine spammers, not to mention that it was a horribly limited method of searching through data: Not only were the choices of keywords entirely subjective -- often grossly incomplete and inconsistent -- but by design it was limited to a very, very small subset of the content. If you really wanted content about metal railings, you might have missed my extensive discussion on that topic in my Burlington Skyway Bridge article because I didn't feel that metal railings made the cut for the keywords.

Meta tags are largely dead now.

Lake Ontario

In its place search engines have become much better at determining what a given page is about (or at least simulating a reasonable promixity thereof). By analyzing content, having a directory of similar and derivative words, and by deriving information by context (such as links and related pages, and how they word links) and layout (noting that heading text, title, and early text holds more importance in classifying the page, though it still is used in concert with the rest of the content), search engines have come a long way it understanding content, and in correlating searches with appropriate results.

The loss of the keyword has proven to be very beneficial for search. Now it's the actual data that classifies the content, rather than artificial metadata.

With improvements in language processors and context associative correlations (e.g. where the content parser understands that the paragraph on boxers is talking about the boxer breed of dog, determined by its correlation with other documents coupled with other details of the language, using language trees to classify probable meaning), things will only get better.

Content search has a very bright present, and a brighter future.

Yet tags continue to spread in woefully inappropriate domains, even where it's serving as nothing more than the modern day equivalent of the venerable META keyword. Instead of building reliable, feature-rich search tools into product, appropriately determining relationships and context to understant content, product vendors are just tossing in a hack-job tag infrastructure and calling their job complete.

Worse still, users are accepting it and calling it a feature.

  Blogging 

Earlier EntriesLater Entries

Dennis Forbes - Dennis Forbes is a Toronto-based software architect and technology writer