Dennis Forbes on Pragmatic Software Development
Subscribe to RSS
 
Tuesday, June 20 2006

Early Perspectives On Open Source

Years back -- in the ancient Web 1.0 world of 1997, when Slashdot was just beginning to enter the geek consciousness -- I became embroiled in an impassioned debate with a peer, arguing the relative merits of open/free software versus commercial/pay software.

Flurries of emails rained over our respective groups, as we fought to evangelize and fortify our positions, building bunkers of suppositions and hyperbole.

At the time I was a fervent admirer of Microsoft and their Ways -- a position that lead to endless accusations that I was a covertly paid astroturfer for the so-called Evil Empire -- not to mention that I was, and remain, a true believer in the capacity for financial incentive to encourage innovation and product excellence.

I also rather enjoy the software development profession, and it wouldn't be untrue to say that my position was partly driven by defensiveness, motivated by a fear that open-source software, and the accompanying ideology and fanatical advocacy, undermined my professional existence.

The Face Of Open Source Evangelism

My opponent, in contrast, was a GPL-embracing, Linux-advocating, Microsoft-hating, Stallmanesque sort. He'd finger through his unkempt beard (where one would expect to find stray noodles from a prior meal), and after trying to convince anyone listening that recompiling one's kernel with drivers specific for the target environment was an ideal arrangement, he'd tear into the evils of closed source commercial software, passionately arguing that closed source, along with intellectual property hoarding, was a moral sin.

We debated the merits of "free as in beer" and "free as in speech" software, with him arguing that many software solutions were so ubiquitous and prevalent, and so easily replicated, that they no longer merited payment or intellectual protection: Innovation in that realm was no longer driven by capitalist forces, but instead arose from scratch-an-itch needs.

Everyone would benefit, he advocated, if we all contributed to the global software pool, and we could all pull out what we need, receiving "payment in kind" by way of other people's contributions. If we found an edge/exception condition that the software couldn't handle, or a missing feature that would solve our need, we could code it in and contribute back.

In essence a sort of software socialism.

The particular example we continually debated concerned basic system functionality, specifically that kernel services had been so thoroughly fleshed out, and were so similar across vendors, that we'd reached the point where the foundational operating system should be free as in beer, and more importantly free as in speech. With the basic infrastructure accounted for and implemented well (albeit constantly evolving as people scratch their itches, and hardware vendors implemented enhancement to leverage new products), organizations across the land could focus their financial and human resource efforts higher in the software ecosystem.

This was just one example, and he held the same opinion for a wide range of system services and development libraries.

All of the big retailers, for instance, could leach off of an open source point of sale system, running on an open source platform, hiring help to customize it to their needs, and then contributing their changes back to benefit others.

The Benefits of Open Source

Win/win for everyone: Retailers got a customized POS system for "nothing" (presuming that evaluations, implementation, and maintenance were free, but ignore those silly nuances) -- to most retailers there is no strategic advantage to using a different POS system than competitors -- which they could customize limitlessly; Customers win because of reduced overhead in the infrastructure, theoretically leading to lower prices; Developers/IT-types win because they get hired on for customization and support, and they have the ability to have as much "inside knowledge" of the product as anyone else (versus a product like SAP, where only those inside SAP have the ability to have complete knowledge of the product, leaving everyone perpetually a step behind), not to mention that a shop will be more open with their wallet, presumably, when they don't have to pay several million dollars to a software vendor.

We'd save the carrot of wealth -- the argument went -- to entice those developers truly pushing boundaries, or gain from the development of large corporations with specific needs ("itches") that would also benefit the overall community, rather than paying tithe to yet another Lempel-Ziv implementation or scorecard system or multitasking kernel, or for yet another reworking of the system services to benefit some up-and-coming new product from the same organization.

And anyways, those people who did contribute to the free-software ecosystem were rewarded in kind by improvements and extensions of what they have done, along with all of the other software they could utilize and leverage.

Seeing the Light...Or At Least Catching A Glimpse

At the time I thought he was nuts, or at least irrationally idealistic.

Not only is it a grossly unbalanced system, where there are magnitudes more consumers than there are creators, but more importantly it seemed to de-professionalize software development: Under the universal implementation of this sort of system, developing software, with software as the product, wouldn't be something that could put food on the table and a roof over one's head, not to mention pay the car loan, put the kids through university, and fund a retirement.

Software development would be limited to hobbyists, parental basement dwellers, and professors attached to the academic teat, disconnected from financial reality. The guy with calloused fingers and tired brain, exhausted after months creating the next open source wonder, barely has a leg up on every johnny-come-lately that appears and offers support and consulting on their creation.
 
It didn't seem tenable, or sustainable.

To be honest, I was sure that the whole open source thing would fizzle out, apart from a couple of large corporate projects run in a traditional manner, releasing source purely as a PR move.

As the years have passed, and my knowledge and the market itself have matured, however, I've been moved more and more to the point of finding myself agreeing with parts of his argument.

Not all of it, though.

I still think the GPL is a wolf in sheep's clothing. I still think the "support" financing angle is laughable for all but the largest, most enterprise-friendly of vendors. I still think the Cathedral and the Bazaar paper is wishful thinking, and is a model that is largely unseen in most open source projects. I still think the overwhelming majority of open source software -- following the whole power law distribution thing -- see little source code attention beyond that given by the direct developer(s), with many projects abandoned after the initiator fails to see the Linux-like attention she expected. I still think that "it's secure because the source is available!" is ridiculously naïve, hinting at a bit of hopeful denial. I still think the simplistic idea that having the source code (such as to the Linux kernel) automatically correlates with being able to actually effectively do something with it hints at a complete lack of experience in software development, where people underestimate the project and domain knowledge required to effectively make use of source beyond changing some label constants. I still think many of the large corporate cheerleaders of open source are exploiting the movement, getting software for their expensive, overpriced, proprietary and patent protected hardware on the backs of a bunch of hobbyists, while offering remarkably little in return. I still think many successful open source projects are largely just traditional software products, with a core group of paid individuals who are responsible for the vast bulk of the implementation, and the project just happens to have source code available.

I still think that "proprietary" closed source systems have a significant role to play, and that a lot of the innovation in the market happens because someone is chasing a dollar, and then everyone falls in line behind.

Even in open source, capitalism motivates a large number of projects as developers imagine great support and consulting contracts just around the bend, or the increased ability to monetize their reputation.

Free As In Beer

Yet frequently I'm finding myself asking "you actually want money for this simple hackjob?"

This is especially true in the Microsoft-ecosystem development market: Where the PHP, Perl and Python worlds have powerful, complex, free (both in terms of availability of source code and monetary cost) components and modules covering virtually every need, one often finds that even the most trivial .NET component include demands for often significant sums of money, not to mention arduous and annoying licensing and compliance requirements, often bundled in a warm nest of layers of IP protections and redistribution requirements.

Which brings up a critical point -- Even ignoring the upfront cost, software licensing is a major PITA. Add in some hack-job activation scheme or machine lockdown, and you've made me paranoid that a simple hard drive failure or system migration is going to cause major hiccups of administration. Suddenly I have to worry about endless dependencies and additional costs for every developer added on the team, with even casual consultants suddenly needing multiple licenses -- and related activations -- to compile with the tab component or the grid component or the POP3 component.

What a nuisance. Sadly this sort of commercial software overhead is only getting worse as time passes.

So even where I don't care for the source code or its availability, and I don't care about the upfront costs, I'm being drawn to free-gratis software where the need in question is relatively trivial or long proven. It's not that I'm adverse to commercial software -- I most certainly am not -- but there is a limit to its exploitation.

From FreeBSD appliances to 7-zip to the GIMP to JAlbum to JDiskReport to the many free online services. Access to the sourcecode is a nicety, and in some cases is critical, and I always demand trustworthiness of the software in question anyways, however it really is the gratis element that's most important to me (as it is for many people, though they mask that simple reality under a cloak of open-source camouflage), because it often comes with extremely liberal usage requirements.

Doesn't This Hurt Developers?

Naturally many developers will gasp in horror at the idea of replacing commercial software with free software. Their natural opposition --and this was the root of my original defensiveness -- is the idea that such an action leads to a contraction of the software development field, basically putting ourselves out of business as more and more of our bread and butter disappears.

"If you aren't paying the Microsoft tax", they argue, "you're dooming us all!".

Nothing could be further from the truth. The Microsoft tax could be what's dooming us all.

Again sticking with Microsoft as an example, given their size and importance, every developer not working at Microsoft -- whether they work for a competitor or for a small IT shop -- needs to remember that Microsoft's eventual goal is to put every one of them out of work. If Microsoft could convince every organization to pay them $10,000 per head per year, and they'd get magically adapting and accomodating, infinitely flexible systems runnable by a minimum-wage temp, they would.

Microsoft is endlessly putting out feelers to determine what shops are paying developers to make, and then they're trying to build wizard-filled, manager-friendly solutions to accommodate that need (for a small fee...). The sales pitch, of course, is a barely concealed statement that you can dump that developer, or at least replace them with someone much less skilled, therefore less expensive.

You know that clever Microsoft ad where the IT department is learning line dancing, given how much easier the new server version is to administer, thus giving them lots of free time? Really it was a not-so-subtle message directed at Vice Presidents -- Here's a way to chop the headcount, putting some people out of work. Just pay Microsoft the not insubstantial cost of $X across your enterprise, and you can save $Y. Even if $Y is lower than $X, capital costs always look "cheaper" on the financials than human resources costs.

How about the Total Cost of Ownership comparisons, where Microsoft proudly shows that the TCO of Windows is comparable, or slightly lower than, Linux for certain needs. They do this by minimizing manpower needs and skill requirements, meaning that you pay more to Microsoft for all of those server licenses and client access licenses, and then you can pay a little less to employees.

It doesn't sound quite as inspiring from such a perspective. When you really step back and look at it, and realize that Microsoft -- or any one software vendor for that matter, and I'm not intending to pick on Microsoft but to use them as an example -- doesn't have interests that are necessarily aligned with yours, Mr. uISV or corporate IT worker bee or consultant, it starts to seem more and more bizarre how defensive many professional software developers are about Microsoft relative to open source source. How bitterly they complain about open source, feeling that every win for Linux or Apache of Firefox threatens their own future.

Efficiency Benefits Us All

None of this is to say that efficiency isn't important -- it's critically important: When I buy a loaf of bread, or shop at a grocery store, or buy a car, it's to my benefit for those organizations to do everything they can to minimize costs, even if that means leveraging systems that make some of their workforce unnecessary. So if Microsoft can throw together a cost effective Biztalk Infopath Sharepoint Portal.NET Server 2007 that eliminates the need for a signficiant percentage of custom development across the land, then that benefits everyone (well...apart from those directly affected -- the people directly losing the jobs).

We all get a little bit more for a little bit less.

The critical point I'm trying to make, however, is that from a personal perspective the argument is not, and has never been, free software -or- commercial software. In fact many commercial software ventures can be helped by the cost savings of free software (presuming that there aren't license conflicts). Your solution looks a lot more palatable when it leverages a stack of free, well-known software, versus requiring a complex mix of server systems and expensive third-party software (we once considered a solution that would have sold for less than $500, yet would have necessitated about $4000 of Microsoft software). Your IT team looks a lot more cost effective when the shop isn't spending millions of dollars a year maintaining various Software Assurance plans, spending tens of thousands on various marginally useful software products per developer head.

It's hardly surprizing that many of the "Web 2.0" success stories started out on, and then scaled up with, open source platforms: Their beginnings weren't hobbled by software fees that exceeded the yearly take of the founders, and they could scale out cost effectively given they didn't require endless CPU licenses, client access licenses, and component fees.

From a personal perspective, I still build solutions largely for the Windows platform, often using integrated technologies such as .NET, ActiveDirectory, and IIS, while leveraging servers such as SQL Server and Biztalk. Yet I do so from a very pragmatic, looking-out-for-#1 perspective -- If I could reasonably justify switching to a free platform, I would. I owe no duty to Microsoft to keep their revenue stream padded.

How This All Came About

Recently I had the need to receive some PGP encrypted files on a server, which necessitated the ability to decrypt them from a script. Seemed pretty straightforward, especially given that I was doing something very similar with PGP freeware over a decade ago.

"Uh oh," I though. "here comes the ubiquitous server tax." (the server tax is the fact that commercial software, no matter how trivial, have a propensity to bizarrely cost much if it runs on a "server"). I headed over to PGP and took a look at the command line version.

This is, in essence, just a slightly updated version of the freeware PGP circa 1995, so I was a little surprized by the price (not to mention the nonsensical sales criteria -- you can buy send only, received only, or the super-deluxe send and receive. You can also choose to use only 1 key, or many keys. All ridiculous separations that just lead to implementation nonsense and complexity). At the time of writing, the component in question costs a staggering $3,170.

Maybe it's "chump change" for a large enterprise, where such costs get buried under thousands of similar purchases, but these sorts of licenses -- not to mention the administration and hassle of them -- add up. They come out of the general IT pot, meaning less hires, less bonuses, more cost-efficiency pressure, and so on.

I went and grabbed a copy of GNUPG, and found that it immediately fit the need perfectly. The code is modern, and it is fully compatible with OpenPGP implementations. I can use 1, 10, or 100 keys with no additional costs. I can run 1 instance or 100 instances if I want. I can send and receive. I never have to hassle with licensing issues, beyond of course ensuring that code changes (given that I legally and immediately get the source code) that are distributed conform with GPL requirements.

The total implementation cost, especially after considering purchasing and licensing issues (e.g. none), was dramatically less than the commerical offering, offering every bit of functionality we needed (and then some), leaving that money in the pot for other needs. We have the added comfort of being about to plan massive scale-outs with minimal additional cost, increasing the project's versatility and flexibility.

Win/win.

Thursday, June 29 2006

Information Hoarding

Before the internet, well-stocked public libraries, and other venues of information dissemination, institutions of higher learning (e.g. university) held more importance. Their influence was earned by having the best people -- most research and innovation happened at universities -- along with all of the important information (great university libraries, which could be visited by only the few), and the resources (chemicals, scientific equipment, medical equipment, telescopes, the Abacus.NET 1922 Pro Edition) necessary to learn and master a profession.

In those days it was close to impossible for someone to gain knowledge in a field -- much less become an expert (recognized or not) -- without overcoming significant barriers to entry and becoming one of the few to partake of these fine institutions.

For most fields there simply was no other way of getting "in".

For the few who did have the pedigree and financial means, and managed to get in, there was the job security of the simple fact that the number of new entrants was artificially limited, and could be modulated with ease when the need arose.

In recent history, however, much of this information, and many of the tools -- both practical and research/training -- have been liberated ("democratized") for some professions, most especially software development. Many of the barriers to entry have fallen.

The Liberation of Software Development

We now live in an era where it's entirely possible for a grade school dropout to learn from the best minds in the industry, to freely use some of the best tools and innovations available (when I started in software development, the costs for even the basic tools were substantial, and of course piracy options for those so inclined were much less accessible: Your peers could get you a copy of Wing Commander or Mule, but not Visual C++ 1.0. Now anyone can get incredibly rich development platforms and tools for nothing, even for the Microsoft platform), and to build class-leading solutions using the best industry patterns and practices.

All without getting their break via the traditional route.

Even for those with their computer science degree -- university is vastly more accessible, both from availability and financial perspectives -- often they'll tell you that what they're applying today is a combination of what they knew before they went, what they learned during intern/co-op placements, and what they learned after getting out.

Graduates of the University of Waterloo CS program, for instance, are in heavy demand largely based upon its excellent and comprehensive co-op placements. In essence employers like it most for the time the students spent out of the classroom -- time where they often acted as extremely junior development partners, many times relegated to mindless grunt work. [Another reason for the University of Waterloo's excellent reputation is a bit recursive: Given its reputation, there's the presumption that the best of the best apply, and only the best of the best of the best get in. It's a bit of a self-sustaining loop, so a Waterloo degree is often used as a sort of crude filter, in the same way that Ivy league degrees have influence]

So why bother with the whole going-to-class-for-four-years supposedly to learn CS thing? It almost seems more effective for everyone to compete to get on an artificially short list, after which they can go on Manpower assignments for several years.

Really that's sort of where the field seems to be going.

Many software development employers ask only for a University degree nowadays -- regardless of the lack of relevance of the major -- using it only as a resume deflection shield, presuming (often incorrectly) that it'll yield them candidates meeting a minimum level of intelligence and commitment. Past whatever largely arbitrary minimum requirements are mandated, relevant experience is often considered far more important than educational accomplishments.

This is an acknowledgement that this profession is at a point where anyone can have the best tools in the industry with just a couple of downloads (those expensive tools that many large shops still like to embrace are more frequently a hindrance than a help, and offer few advantages), and they can leverage and learn from the solutions and experience of the best the industry has to offer. There are remarkably few barriers to entry, outside of a couple of niche areas where experience on uncommon platforms (e.g. SAP) or hardware (e.g. mainframes) is required.

Oh, except skill. There's still that barrier to entry. It tends to be a pretty big barrier to entry.

In many ways software development is mirroring the literary or culinary worlds, where higher learning is pursued for the skills gained rather than the credentials, and competition is open for anyone with just a pen and a pad of paper (or better yet a typewriter and a stack of 8 1/2 x 11), or a couple of pans and a stove.

The Great American Novel

We can all cheaply talk about how we'd like to write the next Great [Insert Nationality Here] Novel, but really it's nothing more than cheap talk if we aren't well on our way to actually doing it. There's nothing externally stopping us, yet remarkably few of us ever will.

Remarkably few of us really have the innate skill, regardless of the seeming ease of taking the first steps. We could all be great chefs (great chefs aren't just people who have been anointed by a group or individual -- they're actually capable of extraordinary things, and earn respect for what they can do) with a couple of pans and ingredients at the grocery store, but most of us never will be.

You can buy yourself the most expensive pens, or the most incredible pans on the most expensive Viking stove, using the rarest and most exclusive ingredients you can find, but it still won't automatically make you any good.

Great Photographs

This all came to mind after hearing yet another comment that my photographs of my children were "professional quality". While my photos are half decent, it's more indicative that people need to modify what they consider "professional" in the field of photography, because the bar has substantially raised. A quick browse through the endless extraordinary photos on flickr makes that quickly evident.

There was a time, in the era before digital cameras, and perhaps moreso before accessible 35mm cameras, where one became a "professional" photographer largely by putting out a significant chunk of capital and buying some expensive equipment. Put out the cash and buy a nice medium-format camera and all of the accessories (the more lenses, flashes and cool looking gizmos, the more professional one was), and one was 80% of the way towards being considered a pro. Everyone else was stuck using garbage little cameras that were basically incapable of taking good pictures.

Hang your sign and start photographing weddings.

With the dropping price of 35mm cameras, things improved somewhat, but even then there was the substantial barrier to entry in the form of learning through experience: Going through rolls and rolls and rolls of film, and the corresponding development, was a very expensive way to learn through mistakes. I remember the serious contemplation that preceded every single shot with my 5xi 35mm, because it would end up costing me over $1 a shot after adding in processing.

Taking multiple shots with differing exposures or focuses or depths of field simply wasn't an economic possibility, so many scenarios where I might have gotten a great shot were limited by my fiscal constraints.

Now with digital cameras, especially some of the nicer offerings, learning through experience is inexpensive and provides immediate feedback, and there are controls for virtually everything. While there are still the diehards who'll go to the ends of the Earth defending 35mm film (which itself was considered laughably inferior to medium format at the time), the results of my Canon Digital Rebel XT are far beyond anything I ever achieved before. Couple this with the fact that I can take thousands upon thousands of amazing pictures, experimenting with exposure and focus and depths of field and motion blur and shutter speeds.

Invariably some of them turn out pretty good. I've actually read the manual to my camera, and have long understood the basics of photography, but I am hardly a professional in this field.

The barrier to entry to take some great photos has substantially fallen, and I'm sure it's put tremendous pressure on a lot of hack photographers. Many of them are loading up on as many expensive lenses they can buy, even where they don't use them, and as extravagant of hardware as they can find, all to try to differentiate themselves from the commoners, yet ultimately the only thing that matters is results.

I've seen some pictures taken using low-end consumer cameras -- quality has risen so much, even at the lowest levels -- that are breathtaking. These are taken by people who would have never ventured into photography at all before.

Of course being able to luck into, or at least hack into, some good results doesn't make me a great photographer. I wouldn't hire me as a wedding photographer, where you can't luck into a couple of good photos, but rather have to capture fleeting moments with quality and consistency.

So while the barriers for neat or beautiful or staged photos have dissolved, for critical fleeting-moment types of needs there still remain some barriers to entry, in that few will trust someone with their event just because they took some good photos of livestock. In those niches you need proven experience before people will give you the opportunity to gain experience.

Tuesday, July 18 2006

Way back in junior high I had a good friend who was a huge fan of military aircraft.

His bedroom walls were covered with huge, hard to procure and often expensive posters of these deadly devices. His desk featured an actual (albeit non-functional) 20mm shell, of the variety used in the depleted-uranium spewing gatling gun.

His favourite military fighter jet happened to be the F-15 Eagle.

Feeling a little left out, I started pouring over his resources, carefully reading his encyclopedia's of fighter aircraft, absorbing all of their attributes. I decided that my favourite fighter jet was the F-14 Tomcat: Clearly its ability to land on carriers, its swing-wing engineering, and the long range phoenix missiles it supported, made it the superior aircraft.

There was no way the F-15 Eagle compared, I argued. The F-14 Tomcat was obviously the choice of those in the know. The enlightened ones, if you will.

Yet the reality -- and I think my friend Brian always knew it -- is that I chose the F-14 primarily because it wasn't the F-15. After picking a natural alternative, I started building layers and layers of justifications for my decision.

I see the same sort of thing fairly typically in software development: Big up front design versus agile designs; Getters/Setters versus fields; namespace naming guidelines of type A or type B; variable naming standards; stored procedure naming standards (or the religious "stored procedure versus dynamic SQL" argument that rages on in teams across the lands); the sorts of types to use for primary keys; the languages and platforms to choose; whether or not to use XML, and what to use it for.

So many times, it seems, people choose their positions based not on actual analysis and honest beliefs, but rather because they're countering someone else in their team -- especially when attempting to undermine authority, actual or perceived -- or they battling someone else in their organization (that dastardly team in Sector G that's trying to get kudos by setting the development guidelines!), or they're deriding someone in the industry.

Often They're just trying to be different and difficult, and the beauty of software development is that there are many, many right ways to do it, and it's easy to find allies in discussion groups to assure one that everyone else is idiots, and their new position is the One True Way.

It's easy to appeal to authority, given that there's some big name or organization that, in some form, promotes just about every software development practice and standard imaginable (Microsoft is a particularly good example of this, as throughout the organization they follow so many standards and practices, that one can easily find an example conforming with their dogma, using it as an example that it's the "Microsoft way",  ignoring the many exceptions).

Of course all of this doesn't preclude disagreement on standards and processes and techniques -- people often truly disagree because they legitimately and rationally believe something different. In a full of intelligent, self-directed professionals, such disparate beliefs and conclusions can be enormously beneficial. The problem is when interpersonal issues materialize as technical disagreements.

Wednesday, July 19 2006

Today -- apparently on the front page of section 2* -- my name was mentioned in the Wall Street Journal. Unrelated to my general professional pursuits, but still it's sort of neat to appear in such a prestigious paper.

* - I don't generally read things about me, or listen to things with me. Maybe it's a superstition, but I just find it creepy.

Tuesday, July 25 2006

The Wall Street Journal Mention

Last Wednesday I was mentioned in the Wall Street Journal (right there on the front of the second section of one of the world's most prestigious newspapers), being referred to as the "world's pre-eminent domainologist" (an article that has been referenced in countless other sources now, including some errant attributions, such as the Toronto Star -- my hometown paper -- seemingly making me a Verisign employee, which of course I'm not). 

Apparently -- or so my wife tells me, given that I don't listen to or read anything that involves me in any way, and even when she talks about this stuff I cover my ears and basically repeated "LaLaLa"s to drown it out -- it was a well-written, humorous piece. While I apparently played the part of a fringe, bit-player, my name does appear quite early in the article, and that's pretty neat to me.

The mention doesn't bring me monetary rewards, and it really doesn't contribute to my professional success in any measurable way (though it's very neat being mentioned, and it was a hugely fun process working with Lee to get the raw material and provide some basic quotes, I'm not really in the business of domain names, and it isn't really a hobby of mine -- being attributed in such a way isn't really something I really want to leverage), but it is yet another weird, discordant mention in mainstream media.

So long as it isn't the notorious sort of mention, somehow it all works into my grand plan of world domination. Bwahahahaha! <rubs eyebrows>

Origins

It all began with a couple of emails from Lee. He indicated that he was from the WSJ, and was interested in talking with me about an article he was considering. After some difficulty finding a common point of availability, we finally chatted in person. This was around Wednesday of the week before.

That evening Lee recorded an initial phone interview, indicating that he had come across my article from back in March, and knew that it had seen a lot of success (for those who didn't see it, it was an article that took off like wildfire across the net, seeing front page action on Digg, Reddit, and mentions from numerous `A-list' bloggers. Quite a few of the entries on here have seen wide "link-love", but the domain name entries absolutely blew all prior -- and following -- records away, seeing close to 100,000 visitors a day for a period of time, still maintaining a lot of incoming interest).

Given that he hadn't come across similar research (he did ask if I knew anyone else doing similar research, perhaps probing to see if I was just a sub-eminent domainologist, and perhaps I would defer to a great authority), he decided to base his article on information I provided, both in the initial article and numerous follow-up queries he asked me to run.

One particularly exhaustive query took around 20 hours of runtime.

All in all it was a lot of fun, and from my end was nothing more than a couple of very brief phone interviews, and then some randomly kicked off queries and emailed results.

What a WSJ Mention Gets You

Despite the fact that the article in question provides limited personally identifying information (and while it's accurate, it is a bit misleading for some. For instance I'm not in New York City -- I'm actually here in a suburb of Toronto -- and the article of course apparently doesn't mention this blog), the immediate effect of the article was dozens of phone calls from people across the US -- and the world -- asking for my opinions on business ideas, asking if domain names people held were good ones, asking if I was interested in partnering on some project or other, asking how to get access to the raw data (see the comments in the main entry -- there's a link to the fax forms), and asking how I ended up being referenced in a WSJ article.

This blog also saw a lot of activity because of the article, with a number of people coming here after searching up obvious terms like "Dennis Forbes domain name". I'm still seeing WSJ-related search activity today (maybe hermits are just adding the issue to their apartment newspaper mountains).

I've received requests for radio interviews (I've done a couple of those before, and it isn't my favourite genre: I'm too full of self-doubt when it comes to accuracy, and mortally fear the possibility of saying something incorrect in response to an adhoc question. In such an instance I'd rather say nothing until I can verify, with certainty, that what I'm saying is correct. I haven't been "blessed" with the arrogance and confidence that allows some to make the most absurd of proclamations with zero self-doubt or hesitation), and have gotten requests for, and responded to, several email interviews.

All in all a very entertaining process, and it was interesting to take part in it. It has me looking for my next angle for media exposure.

How To Become The World's Pre-Eminent Domainologist

Of course Lee was being facetious when he assigned me with this title, and really I found it gut-busting hilarious when I heard it myself.

The original domain name article actually came about because I needed a medium-sized database to demonstrate high-performance database operations. While I was indeed curious about domain names, ultimately I requested access purely to have a large set of data to demonstrate some index-backed operations. I was shocked when I discovered that one could actually acquire a copy of the zone file. 

I really haven't been poring over zone files for years, amazingly reading trends and consistencies from streams of raw data.

After receiving the data, I saw that it really was interesting and entertaining, so in a single night I threw together the original article: Right after getting my credentials from Verisign, I downloaded the 850MB compressed file, extracted, imported and cleansed it, and then ran some humorous queries to see if it yielded interesting results. Seeing some of the answers, I thought it would be good blog fodder so I tossed an article together and put it online.

Over the next week I only had a free moment here and there, so I belatedly put up a follow-up article, in my haste skipping many of the tests that I had promised (for instance the English language queries, which I only finally finished at Lee's request).

My interest was short-term, and my technique was mostly driven by the biggest bang-for-the-buck queries that would yield interesting blog material, while allowing me to save my time for my family and my profession (in that order). I wasn't really sitting there month after month anxiously watching streaming domain name data, inferring complex patterns like a savant. Instead it was a couple of low-hanging fruit queries against the imported dataset, writing up the results when it was unexpected or entertaining.

Of course then the material seemed to be exhausted (the follow-up article saw much less attention), and my personal curiousity waned. The database then sat and collected bitular rust.

It's a marvel that it didn't get deleted to free up room for my prime-number database, or my ridiculously expanding set of digital pictures.

Then Lee called, I fired up the database -- to my surprize I still had it -- and the rest is history.

On Making A Hugely Popular Blog Entry

Populism has seldom been a goal of these entries, but a couple of entries, not to mention observations of the meme sites, have given me some insights into what are some elements that increase the probability of an entry taking off. Let's just say that I'm the World's Pre-Eminent Meme Site Popularity Assessment Expert (WPEMSPA, aka Wimpy-Sumpa).

  • Obvious topic, and easy consumption - There is a general inverse correlation between number of words in an article/entry relative to its popularity on sites such as Digg and Reddit, not to mention blog references. This certainly isn't universal -- I've come across some incredible essays on those sites -- but the easiest pages to get widely linked include a relatively significant amount of graphics, or a very limited amount of text.

    Note the predominance of "top 10" style entries, where readers don't even have to read the summaries, instead deriving impressions based only on the list positions.

    Bloggers love to reference these sorts of articles because it saves time "RTFMing", allowing them to skim and post summaries of dozens of stories they've seen across the web without the hassle of actually reading them. Generally the only lengthy articles that get widely linked are those from sites such as the NY Times, or by long known industry figures such as Paul Graham, because people can skim and just assume the rest of the content, presuming that the author or publisher assures them that the rest must be half decent as well.
  • Everyone can relate - Almost everyone, including the non-technical, has contemplated the idea of creating the next .COM success story, jetting around in their own personal 737. Many have visited a registrar, desperately punching in combinations in hopes that by some amazing coincidence no one ever bothered registering cars.com and similar low-hanging domains. They heard the get rich quick stories, so they want to get rich. And quick!

That's about it. Create entries covering everyday topics, and populate it with easy to digest graphics, and summaries that give cursory linkubators comfort that they're linking something interesting.

Enjoy the endless incoming traffic!

Monday, July 31 2006

"That design might work for a stateful desktop app, but it isn't appropriate for the stateless web."

"O/RM isn't appropriate for stateless environments like HTTP!"

"This component wasn't made for the stateless environment of HTTP!"

"...but HTTP is stateless!"

If you've done any sort of web development, you've probably heard proclamations like these. You may have even made them yourself.

But what do they really mean? Do they add any value to the conversation?

So What Does Stateless Mean Anyways?

Stateless refers to an architecture where each HTTP request is fundamentally detached from requests that came before, and unrelated to requests that will follow.

In a stateless world, the browser initiates a TCP connection on port 80 - traditionally, or port 443 if it's a secure connection - and then sends some basic commands, such as the desired document (e.g /images/coolpicture.jpg), along with this-request preferences like the user's desired language.

With no prior information about the caller - acting only on the newly generated information in the request (e.g. the document requested, along with user submitted form values) - the server sends the results.

> GET /images/coolpicture.jpg

< the binary data for /images/coolpicture.jpg..

After the single request is serviced, the connection is torn down in this stateless scenario. The desired goal was to service each request as quickly as possibly, freeing the resource-heavy, finite-quantity connection to service other callers.

Maximum output with minimum resources.

This served the early web very well. Mirsky's Worst of the Web could be served out to thousands of anonymous consumers with gusto on minimal hardware, fulfilling the liberal information sharing origins of HTTP.

Stateless In The Non-Internet World

For a historic analogy, think of the 411 telephone service - you dial the number and establish the connection. You tell the operator the person whose number you require, and they provide a number in response. The call is disconnected, freeing the line and the operator for the next caller.

This is stateless in that the service relies upon no contextual information preceding the call to provide the service, allowing a small number of operators and connections to handle a large number of lookup requests, needing no resources beyond a simple phone book.

A stateful 411, on the other hand, would be one where you called 411 and left the phone off the hook, maintaining the connection for perhaps days at a time. With each number lookup request, they would try to interpret what you really mean based upon the requests that came before.

"Earlier you asked for a bait store on Main street, and now you're looking for a tackle store. I'm going to guess that you probably want one on or near Main street. The number is..."

Such a stateful connection wouldn't even require you to maintain the call - they could just pull up your records based upon the calling phone number, immediately having the history of your interactions to draw from in a stateful manner, regardless of the transience of the individual call.

Stateful Back In The Internet World

The stateless definition of HTTP was used to contrast with existing services like telnet and FTP, where a TCP connection (itself a stateful protocol) was made, after which a state was maintained and modified from command to command -- whether you were logged in, what directory you were in, what application was running, and so on.

The state was alive and changing until the connection was dropped, with a block of server resources dedicated to keeping alive a world just for you.

That design worked for those services because connections were generally "higher value" per request - a long running file transfer that couldn't serve many clients anyways, as a function of the large number of bytes per request; a professor running some batch jobs; etc.

Bridging the Gap

Most readers will know that almost all websites these days appear to be stateful.

You log on. It presents data that is specific to you, using preferences that are individual to you. As you do things, the environment changes and adapts, incorporating your interactions into following requests.

This isn't just an illusion, or a bastardization of the web: THESE WEBSITES ARE STATEFUL.

So how did the web sneak up and become stateful on everyone? Well, generally via the magic of cookies (alternately via URL-appended session identifiers to simulate cookies), an addition to the HTTP protocol that was first implemented by Netscape back in 1995.

A session cookie is often nothing more than a unique identifier (preferably with enough entropy that users can't guess each other session identifiers, for instance a randomly generated GUID), passed to the server on each request, allowing the web server to tie requests together, building a set of session data to provide state for a given client -  The logon form changes the home page render changes the topic listing changes the calendar selector changes the news view, and so on, with each page having available a set of stateful information about the client, forming a sort of virtual "persistent connection" over many individual, seemingly isolated HTTP requests.

"Ha! Got You! There Isn't A Constant Connection! So It's Stateless!?"

Ignoring the fact that in the modern world HTTP connections are reused (given that a client will often request dozens or more documents to build a single page - or in the case of Digg about 37,528 - it was found to be cheaper to just let the client reuse a built connection for multiple requests), often people differentiate HTTP from being "stateful" because it doesn't maintain a constant connection for the entire session.

Yet what is a connection? In this case it would be TCP, a "stateful" protocol. TCP is stateful in that it changes based upon what has happened before, and each packet for the duration of a connection relies upon those before them getting through okay.

You can establish a connection, let it sit for a while, and occasionally pass data back and forth.

TCP is stateful in contrast to IP (or its very light encapsulation, UDP), which is individual packets that live or die by themselves, with no consciousness of packets that came before, or those that will follow.

But wait, isn't it TCP/IP? TCP on top of IP?

Why yes, it is. TCP is fundamentally "IP with cookies", allowing it to maintain session state, tying many stateless packets together into a nice, clean stateful correspondence. This differs little from HTTP with cookies, a fundamentally stateful protocol when coupled in virtually any post-1996 implementation, where the idea of sessions and statefulness are the norm.

The Web Isn't Stateless!

So why does everyone keep yabbering nonsense about HTTP being stateless (pedantically true, but practically irrelevant and entirely misleading)? Why do so many people talk about the web being stateless in the face of endless contradictory evidence?

I think it's just a cop out: People want to validate their crappy web apps - possibly due to laziness or a desire to migrate back to fat apps - so they clutch onto the justification that it's a fundamental limitation of the platform that limits their abilities, constrains their design or forces them into hackish implementations.

In reality, the web that we've been developing against for the past 10 years has allowed tremendous statefulness, including building up and maintaining enormous quantities of server-side state for every session (just like a fat app or a DCOM component): Just because that isn't appropriate for a very high volume, low value-per-transaction anonymous user website should in no way guide you in your implementation of a low user count, very high value-per-transaction vertical market web app.

You have the ability, and the mandate, to do what's right for the problem, and no one solution or dogma fits all web needs.

Saturday, August 19 2006

An oft referenced problem in the Windows world is .Dll Hell (*). It occurs when many applications depend upon the code in a shared .dll (a dynamic link library, which is basically code that is linked at runtime rather than compile time), an often ideal scenario given that you can upgrade security faults in one single location rather than recompiling and distributing static linked library using applications, or searching for disparate private copies scattered across volumes. Problems start to happen, however, if the dll is changed in a way that breaks some of the dependent consumers (for instance one of the applications rolls out a new version that changed the external API), causing inconsistencies or outright failures in other applications.

[* - Sidenote: The Wikipedia article linked from DLL Hell claims that the term was "introduced to the general public by Rick Anderson" in 2000. This is, of course, complete and utter nonsense -- it was a very common piece of terminology many years earlier, and an MSDN article hardly introduces it to the "general public". I come across these sorts of historical revisionisms on Wikipedia far too many times. Is it a Wikiality? I suppose I "introduced SVG to the general public" when I wrote a "paper" for MSDN Magazine, so I should go claim my crown...]

While the problem already existed for classic-code dlls that were stored in a shared location (usually for space-saving reasons), it really became a problem with OCX/COM, where the activation architecture basically demanded that you use the shared copy.

In spirit, similar problems occur even with high-level platforms such as Apache, or even just modules like PHP, where a version change can break a lot of applications that run atop it or depend upon it, causing significant heartache, and making deployment issues much more complex (particularly when you have multiple dependent applications, some of them more adaptable than others) .

There have been many declarations of an "end to DLL hell!", with Microsoft pushing various approaches and strategies, to varying success.

With .NET, the solution is generally "share nothing", to the point that even the various versions of the .NET runtime exist as islands, with a .NET 2.0 application having all of its libraries local (often version linking, so if the same library exists in many applications, but the versions differ slightly, it will be loaded separately and mapped individually), even if they're components used by dozens of applications, using the .NET 2.0 framework island and runtime, while a .NET 1.1 application exists in its own little world, and the same for a .NET 1.0 app. There still exists a classic "shared activation" model via the global assembly cache (GAC); however it's a little used bit of infrastructure.

Storage space is incredibly cheap, and memory space is becoming a non-issue, so this sort of approach has a lot of merit.

Why not take it a level higher? With massively powerful servers, seemingly endless memory, and free virtual server products (from both VMWare and Microsoft), we're entering an era when it is entirely possible, and often ideal, to release your product as a complete virtual server.

Of course, I'm repeating myself now, but this idea really appeals to me.

Some time back, for instance, I was considering making a commercial, corporate web application timesheet tracking system (I've made some of these before. One particular one - an AJAXish DHTML solution I made back in the late 90s - I still think beats out most of what I see today), however a hosted model wouldn't fly with most customers given the amount of information that could be garnered from their timesheets: Many customers would want to host it themselves. Yet then you face the dilemma of releasing a product that can exist within their current architecture and skillset, a particularly onerous task given the many dependencies of a modern web application.

Inevitably you'd be putting yourself out of contention for a lot of customers because you used X instead of Y, and would be endlessly fielding support issues when their platform changed faster (or slower) than your application did.

So why not release your web application (or any type of application) on an "appliance" virtual machine, as it's now getting named? The same goes for application "consumption": If you're a Windows shop, instead of hosting your wiki on Windows, or far worse limiting your choices to the small selection of options that exist for your particular ecosystem of dependencies, perhaps you could just deploy a Wiki appliance with the perfectly ideal configuration of database server, web server, host operating system, and modules.

Configure your appliance to only allow port 80 traffic in (or better yet work on an appliance platform where the accessible ports on each virtual machine can be configured, perhaps by a separate "firewall" virtual machine), and live in an application model, with whatever version of MySQL, Postgresql, or Apache you want, custom configured in a way that perfectly matches your requirements.

Virtual machines have so many advantages, not the least of which is the ability to move them between hardware with minimal hassle. Indeed, I had exactly this scenario recently, where the Team Foundation Server application tier was running on a box that was getting a little overloaded...well it was just a virtual machine, so it was nothing more than pausing the state, moving it to another virtual server hosting box, and starting it up. This balanced the load better, and was completely transparent to the users.

There are downsides that would have to be taken into account - some shops might want a better backup solution than pausing the virtual machine and archiving the entire virtual hard drive (which is, I should mention, a wonderful capability -- the entire "machine" in one single, relatively small file, atomically copyable and restorable. In development I've used this endless to save various platform configurations, restoring to exactly the one that is pertinent for a particular need), however there are endless possible, application specific solutions to this sort of problem.

There's also the issue that Microsoft doesn't take kindly to releasing virtual machines based on their software, so perhaps this is a model that works best when the software you're depending upon is freely distributable (within the confines of the license).

Earlier EntriesLater Entries

Dennis Forbes - Dennis Forbes is a Toronto-based software architect and technology writer