Way back in junior high I had a good friend who was a huge fan of military aircraft.
His bedroom walls were covered with huge, hard to procure and often expensive posters of these deadly devices. His desk featured an actual (albeit non-functional) 20mm shell, of the variety used in the depleted-uranium spewing gatling gun.
His favourite military fighter jet happened to be the F-15 Eagle.
Feeling a little left out, I started pouring over his resources, carefully reading his encyclopedia's of fighter aircraft, absorbing all of their attributes. I decided that my favourite fighter jet was the F-14 Tomcat: Clearly its ability to land on carriers, its swing-wing engineering, and the long range phoenix missiles it supported, made it the superior aircraft.
There was no way the F-15 Eagle compared, I argued. The F-14 Tomcat was obviously the choice of those in the know. The enlightened ones, if you will.
Yet the reality -- and I think my friend Brian always knew it -- is that I chose the F-14 primarily because it wasn't the F-15. After picking a natural alternative, I started building layers and layers of justifications for my decision.
I see the same sort of thing fairly typically in software development: Big up front design versus agile designs; Getters/Setters versus fields; namespace naming guidelines of type A or type B; variable naming standards; stored procedure naming standards (or the religious "stored procedure versus dynamic SQL" argument that rages on in teams across the lands); the sorts of types to use for primary keys; the languages and platforms to choose; whether or not to use XML, and what to use it for.
So many times, it seems, people choose their positions based not on actual analysis and honest beliefs, but rather because they're countering someone else in their team -- especially when attempting to undermine authority, actual or perceived -- or they battling someone else in their organization (that dastardly team in Sector G that's trying to get kudos by setting the development guidelines!), or they're deriding someone in the industry.
Often They're just trying to be different and difficult, and the beauty of software development is that there are many, many right ways to do it, and it's easy to find allies in discussion groups to assure one that everyone else is idiots, and their new position is the One True Way.
It's easy to appeal to authority, given that there's some big name or organization that, in some form, promotes just about every software development practice and standard imaginable (Microsoft is a particularly good example of this, as throughout the organization they follow so many standards and practices, that one can easily find an example conforming with their dogma, using it as an example that it's the "Microsoft way", ignoring the many exceptions).
Of course all of this doesn't preclude disagreement on standards and processes and techniques -- people often truly disagree because they legitimately and rationally believe something different. In a full of intelligent, self-directed professionals, such disparate beliefs and conclusions can be enormously beneficial. The problem is when interpersonal issues materialize as technical disagreements.
Today -- apparently on the front page of section 2* -- my name was mentioned in the Wall Street Journal. Unrelated to my general professional pursuits, but still it's sort of neat to appear in such a prestigious paper.
* - I don't generally read things about me, or listen to things with me. Maybe it's a superstition, but I just find it creepy.
Last Wednesday I was mentioned in the Wall Street Journal (right there on the front of the second section of one of the world's most prestigious newspapers), being referred to as the "world's pre-eminent domainologist" (an article that has been referenced in countless other sources now, including some errant attributions, such as the Toronto Star -- my hometown paper -- seemingly making me a Verisign employee, which of course I'm not).
Apparently -- or so my wife tells me, given that I don't listen to or read anything that involves me in any way, and even when she talks about this stuff I cover my ears and basically repeated "LaLaLa"s to drown it out -- it was a well-written, humorous piece. While I apparently played the part of a fringe, bit-player, my name does appear quite early in the article, and that's pretty neat to me.
The mention doesn't bring me monetary rewards, and it really doesn't contribute to my professional success in any measurable way (though it's very neat being mentioned, and it was a hugely fun process working with Lee to get the raw material and provide some basic quotes, I'm not really in the business of domain names, and it isn't really a hobby of mine -- being attributed in such a way isn't really something I really want to leverage), but it is yet another weird, discordant mention in mainstream media.
So long as it isn't the notorious sort of mention, somehow it all works into my grand plan of world domination. Bwahahahaha! <rubs eyebrows>
It all began with a couple of emails from Lee. He indicated that he was from the WSJ, and was interested in talking with me about an article he was considering. After some difficulty finding a common point of availability, we finally chatted in person. This was around Wednesday of the week before.
That evening Lee recorded an initial phone interview, indicating that he had come across my article from back in March, and knew that it had seen a lot of success (for those who didn't see it, it was an article that took off like wildfire across the net, seeing front page action on Digg, Reddit, and mentions from numerous `A-list' bloggers. Quite a few of the entries on here have seen wide "link-love", but the domain name entries absolutely blew all prior -- and following -- records away, seeing close to 100,000 visitors a day for a period of time, still maintaining a lot of incoming interest).
Given that he hadn't come across similar research (he did ask if I knew anyone else doing similar research, perhaps probing to see if I was just a sub-eminent domainologist, and perhaps I would defer to a great authority), he decided to base his article on information I provided, both in the initial article and numerous follow-up queries he asked me to run.
One particularly exhaustive query took around 20 hours of runtime.
All in all it was a lot of fun, and from my end was nothing more than a couple of very brief phone interviews, and then some randomly kicked off queries and emailed results.
Despite the fact that the article in question provides limited personally identifying information (and while it's accurate, it is a bit misleading for some. For instance I'm not in New York City -- I'm actually here in a suburb of Toronto -- and the article of course apparently doesn't mention this blog), the immediate effect of the article was dozens of phone calls from people across the US -- and the world -- asking for my opinions on business ideas, asking if domain names people held were good ones, asking if I was interested in partnering on some project or other, asking how to get access to the raw data (see the comments in the main entry -- there's a link to the fax forms), and asking how I ended up being referenced in a WSJ article.
This blog also saw a lot of activity because of the article, with a number of people coming here after searching up obvious terms like "Dennis Forbes domain name". I'm still seeing WSJ-related search activity today (maybe hermits are just adding the issue to their apartment newspaper mountains).
I've received requests for radio interviews (I've done a couple of those before, and it isn't my favourite genre: I'm too full of self-doubt when it comes to accuracy, and mortally fear the possibility of saying something incorrect in response to an adhoc question. In such an instance I'd rather say nothing until I can verify, with certainty, that what I'm saying is correct. I haven't been "blessed" with the arrogance and confidence that allows some to make the most absurd of proclamations with zero self-doubt or hesitation), and have gotten requests for, and responded to, several email interviews.
All in all a very entertaining process, and it was interesting to take part in it. It has me looking for my next angle for media exposure.
Of course Lee was being facetious when he assigned me with this title, and really I found it gut-busting hilarious when I heard it myself.
The original domain name article actually came about because I needed a medium-sized database to demonstrate high-performance database operations. While I was indeed curious about domain names, ultimately I requested access purely to have a large set of data to demonstrate some index-backed operations. I was shocked when I discovered that one could actually acquire a copy of the zone file.
I really haven't been poring over zone files for years, amazingly reading trends and consistencies from streams of raw data.
After receiving the data, I saw that it really was interesting and entertaining, so in a single night I threw together the original article: Right after getting my credentials from Verisign, I downloaded the 850MB compressed file, extracted, imported and cleansed it, and then ran some humorous queries to see if it yielded interesting results. Seeing some of the answers, I thought it would be good blog fodder so I tossed an article together and put it online.
Over the next week I only had a free moment here and there, so I belatedly put up a follow-up article, in my haste skipping many of the tests that I had promised (for instance the English language queries, which I only finally finished at Lee's request).
My interest was short-term, and my technique was mostly driven by the biggest bang-for-the-buck queries that would yield interesting blog material, while allowing me to save my time for my family and my profession (in that order). I wasn't really sitting there month after month anxiously watching streaming domain name data, inferring complex patterns like a savant. Instead it was a couple of low-hanging fruit queries against the imported dataset, writing up the results when it was unexpected or entertaining.
Of course then the material seemed to be exhausted (the follow-up article saw much less attention), and my personal curiousity waned. The database then sat and collected bitular rust.
It's a marvel that it didn't get deleted to free up room for my prime-number database, or my ridiculously expanding set of digital pictures.
Then Lee called, I fired up the database -- to my surprize I still had it -- and the rest is history.
Populism has seldom been a goal of these entries, but a couple of entries, not to mention observations of the meme sites, have given me some insights into what are some elements that increase the probability of an entry taking off. Let's just say that I'm the World's Pre-Eminent Meme Site Popularity Assessment Expert (WPEMSPA, aka Wimpy-Sumpa).
That's about it. Create entries covering everyday topics, and populate it with easy to digest graphics, and summaries that give cursory linkubators comfort that they're linking something interesting.
Enjoy the endless incoming traffic!
"That design might work for a stateful desktop app, but it isn't appropriate for the stateless web."
"O/RM isn't appropriate for stateless environments like HTTP!"
"This component wasn't made for the stateless environment of HTTP!"
"...but HTTP is stateless!"
If you've done any sort of web development, you've probably heard proclamations like these. You may have even made them yourself.
But what do they really mean? Do they add any value to the conversation?
Stateless refers to an architecture where each HTTP request is fundamentally detached from requests that came before, and unrelated to requests that will follow.
In a stateless world, the browser initiates a TCP connection on port 80 - traditionally, or port 443 if it's a secure connection - and then sends some basic commands, such as the desired document (e.g /images/coolpicture.jpg), along with this-request preferences like the user's desired language.
With no prior information about the caller - acting only on the newly generated information in the request (e.g. the document requested, along with user submitted form values) - the server sends the results.
> GET /images/coolpicture.jpg
< the binary data for /images/coolpicture.jpg..
After the single request is serviced, the connection is torn down in this stateless scenario. The desired goal was to service each request as quickly as possibly, freeing the resource-heavy, finite-quantity connection to service other callers.
Maximum output with minimum resources.
This served the early web very well. Mirsky's Worst of the Web could be served out to thousands of anonymous consumers with gusto on minimal hardware, fulfilling the liberal information sharing origins of HTTP.
For a historic analogy, think of the 411 telephone service - you dial the number and establish the connection. You tell the operator the person whose number you require, and they provide a number in response. The call is disconnected, freeing the line and the operator for the next caller.
This is stateless in that the service relies upon no contextual information preceding the call to provide the service, allowing a small number of operators and connections to handle a large number of lookup requests, needing no resources beyond a simple phone book.
A stateful 411, on the other hand, would be one where you called 411 and left the phone off the hook, maintaining the connection for perhaps days at a time. With each number lookup request, they would try to interpret what you really mean based upon the requests that came before.
"Earlier you asked for a bait store on Main street, and now you're looking for a tackle store. I'm going to guess that you probably want one on or near Main street. The number is..."
Such a stateful connection wouldn't even require you to maintain the call - they could just pull up your records based upon the calling phone number, immediately having the history of your interactions to draw from in a stateful manner, regardless of the transience of the individual call.
The stateless definition of HTTP was used to contrast with existing services like telnet and FTP, where a TCP connection (itself a stateful protocol) was made, after which a state was maintained and modified from command to command -- whether you were logged in, what directory you were in, what application was running, and so on.
The state was alive and changing until the connection was dropped, with a block of server resources dedicated to keeping alive a world just for you.
That design worked for those services because connections were generally "higher value" per request - a long running file transfer that couldn't serve many clients anyways, as a function of the large number of bytes per request; a professor running some batch jobs; etc.
Most readers will know that almost all websites these days appear to be stateful.
You log on. It presents data that is specific to you, using preferences that are individual to you. As you do things, the environment changes and adapts, incorporating your interactions into following requests.
This isn't just an illusion, or a bastardization of the web: THESE WEBSITES ARE STATEFUL.
So how did the web sneak up and become stateful on everyone? Well, generally via the magic of cookies (alternately via URL-appended session identifiers to simulate cookies), an addition to the HTTP protocol that was first implemented by Netscape back in 1995.
A session cookie is often nothing more than a unique identifier (preferably with enough entropy that users can't guess each other session identifiers, for instance a randomly generated GUID), passed to the server on each request, allowing the web server to tie requests together, building a set of session data to provide state for a given client - The logon form changes the home page render changes the topic listing changes the calendar selector changes the news view, and so on, with each page having available a set of stateful information about the client, forming a sort of virtual "persistent connection" over many individual, seemingly isolated HTTP requests.
Ignoring the fact that in the modern world HTTP connections are reused (given that a client will often request dozens or more documents to build a single page - or in the case of Digg about 37,528 - it was found to be cheaper to just let the client reuse a built connection for multiple requests), often people differentiate HTTP from being "stateful" because it doesn't maintain a constant connection for the entire session.
Yet what is a connection? In this case it would be TCP, a "stateful" protocol. TCP is stateful in that it changes based upon what has happened before, and each packet for the duration of a connection relies upon those before them getting through okay.
You can establish a connection, let it sit for a while, and occasionally pass data back and forth.
TCP is stateful in contrast to IP (or its very light encapsulation, UDP), which is individual packets that live or die by themselves, with no consciousness of packets that came before, or those that will follow.
But wait, isn't it TCP/IP? TCP on top of IP?
Why yes, it is. TCP is fundamentally "IP with cookies", allowing it to maintain session state, tying many stateless packets together into a nice, clean stateful correspondence. This differs little from HTTP with cookies, a fundamentally stateful protocol when coupled in virtually any post-1996 implementation, where the idea of sessions and statefulness are the norm.
So why does everyone keep yabbering nonsense about HTTP being stateless (pedantically true, but practically irrelevant and entirely misleading)? Why do so many people talk about the web being stateless in the face of endless contradictory evidence?
I think it's just a cop out: People want to validate their crappy web apps - possibly due to laziness or a desire to migrate back to fat apps - so they clutch onto the justification that it's a fundamental limitation of the platform that limits their abilities, constrains their design or forces them into hackish implementations.
In reality, the web that we've been developing against for the past 10 years has allowed tremendous statefulness, including building up and maintaining enormous quantities of server-side state for every session (just like a fat app or a DCOM component): Just because that isn't appropriate for a very high volume, low value-per-transaction anonymous user website should in no way guide you in your implementation of a low user count, very high value-per-transaction vertical market web app.
You have the ability, and the mandate, to do what's right for the problem, and no one solution or dogma fits all web needs.
The grade 9 English assignment demanded that each of us to write a 2- or 3-page essay describing how we would improve society: What would we do to improve the quality of life for all Canadians?
This was 1986, just after pro-"wrestling" and break-dancing fads started thankfully fading from mass appeal. This was an era when global nuclear warfare (would you like to play a game?) still seemed not only possible, but probable - though Gorbechev's glasnost policies were definitely reducing tensions from the paranoid levels a few years earlier - so the concepts of freedom, democracy, and bomb-shelters appeared in quite a few of the submissions.
For my submission - a creation of pure literary genius, or so I thought - I combined the fundamentals of democracy with my recent discovery of local BBSs. I hypothesized that soon we'd be in a nationally connected world that would allow citizens from coast to coast to communicate with each other and access common resources on their home computer (be it Vic-20, or ultra-high end Commodore-64). Basically I was just describing the existing packet-switched commercial services, and the burgeoning Internet (I later created a multi-"channel", packet-oriented modem protocol on my Atari ST - basically a really primitive, foolish version of TCP), but envisioned it as a government built, publicly owned system, supplying every Canadian with this basic piece of infrastructure.
With this data communications network, I argued, we could finally build a system where government could be implemented as a pure democracy.
No longer would we have to elect local officials to carry our agenda to parliament, but now we could simply put every policy question to the people, allowing the populace to directly decide how the country will be governed.
I was certain that my idea was brilliant, and was a little disappointed when it was returned to me after marking (I was sure it would be passed on to important people for implementation. Perhaps they photocopied it?). Aside from a mark, the teacher - whose name I don't recall, though I do remember that her and her husband owned a car dealership - wrote a rather cryptic line about it being idealist and unworkable, which stung a bit, hence why this story sticks with me still today.
If for some reason I were required to rewrite that paper today, I'd probably concede the teacher's point. I would thank her for giving me a bit of cynicism and insight that perhaps I didn't have before.
After years of watching public opinion ebb and flow, I'm now of the opinion that a pure democracy would be an absolute disaster.
Apart from the fact that it would likely lead to a tyranny of the majority (where 50%+1 = someone else's agenda forced down your throat. This is of course how our system generally currently works, but in a more time-limited, detached manner), the core problem is that many voters simply don't take the responsibility required even for once-every-4-year trips to the ballot box, much less flippantly deciding each and every issue facing the government.
I certainly don't believe in an illuminati running government, and a democracy is empirically the ultimate form of government, but the small disconnect between the public and the government (the interface being elected representatives who are accountable for the government and its decisions) allows government to do what is necessary and right.
In essence it allows government to take actions that might temporarily anger the public in a short-sighted manner, but which we'll come to appreciate as time goes on. There have been countless times where polls indicated that the public was signfiicantly against of behind X, but to follow such an agenda would be disastrous. The government largely ignores such polls, not falling to populism and pandering to every shift in perception, and it blows over and the public sees the big picture.
Imagine if, instead, every night when you came home (or perhaps only once a week) there were several policy issues requiring your input.
"Should the bridge to the island airport be built?"
"Should the department of departments be privatized?"
Consider how little thought and attention people give to their several-times-a-decade visit to the ballot box. Now imagine how much worse it would be if everyone were questioned on every single government initiative.
"Well then making voting optional! Let the people who know about the topic vote!"
Such an opt-in arrangement is how "special interests" are born, and it's how they have much more sway than they perhaps should.
Stephen Colbert did a humorous segment on truthiness last night, this time on the topic of historical revisionism. You can view the clip at http://www.youtube.com/watch?v=zmHm0rGns4I.
What made this segment particularly famous (or infamous) were Stephen's (Mr. Colbert's?) comments regarding the illustrious Wikipedia: After indicating that he was revising some entries to alter history, pretending to do it on a laptop during taping (a supposed Stephencolbert user pretty much simultaneously -- to the airing, not the taping -- made a couple of edits correlating with the show. These edits were on the topic of George Washington and The Colbert Report recurring elements, exactly as indicated on the show. This user could just as easily have been a third party following along, but the effect is the same, and is just as humorous), he then coined the new word Wikiality.
What really raised the ire of the Wikipedia defenders, however, was Mr. Colbert's humorous petition for users to support him in his quest for historical revisionism, altering the Wikipedia entry for elephants to support a fictional 3x increase in the total population over the past 6 months ("Explain that Al Gore!"). Many played along, until eventually the page in question (and virtually all other pages related to elephants) was locked to avoid this jovial vandalism.
Personally I think Stephen made a brilliant point, even if there was a bit of collateral damage. Some of the reaction to it has simply been ridiculous.
Of course these, err, truths hold for more than just Wikipedia: Virtually any user-contributed site faces the same problems.
Reddit, for instance -- an up and coming meme site -- lets users "vote" which links and comments are most, well, in line with one's own view (while the links get rated on a what is occasionally a meritocracy, the voting on comments is usually extremely one sided, having very little to do with presenting a valid, well-spoken argument, and more to do with saying something that correlates with every fly-by voters opinion. It is actually embarrassing seeing your own comment scored up because you happen to share the majority view, while the well-written and convincing posts of your adversary sink into underflow territory).
Using an apparently basic votes-over-time algorithms, the app determines which links to put on the front page -- Getting on the front page is obviously a desirable place to be for someone try to push a perspective or an agenda (remember that the majority of users on most of these sites are lurkers - while many people are set in their position, and are valiant, tireless crusaders for the cause, there are a lot of people who are on the fence, absorbing whatever information on a topic is presented to them, willing to change their position based upon new inputs). My personal experience here has been that a Reddit front page isn't anything like a Digg front page in the volume of traffic it sends to you, but it still brings in a considerable number of users ready and waiting to be stuffed with one's perspective.
So how does one get on the front page? Well aside from pandering to the natural bias of the Reddit crowd (a crowd that leans towards libertarianism/anti-authority/extreme liberalism/Lispism, a demographic that is heavily reflected in the vote patterns), getting on the front page can be accomplished with little more than a dozen votes over a short period of time (one vote per IP, folks). Staying on the front page for a work day can be accomplished with just a couple hundred up votes.
Topping the all time record books in the Reddit universe takes less than 900 up votes after the negations have been subtracted out.
How hard is it for a special interest to manipulate a site like this? The Israeli support site up above is fairly open and inclusive in their advocacy, but surely all such groups aren't so forthright -- Microsoft has some 60,000 employees, and while they have a limited number of work IPs, there are certainly 10s of thousands of home IPs that can be used to push an agenda. The same goes for oil companies, and virtually any other large organization.
While I hardly think an organization like Microsoft is going to enlist employees in a concerted astroturfing drive, is it difficult to imagine that there are similar groups doing something similar right now? I've already pointed out the Israel support orchestration above (and for those who would argue that their votes are legitimate, the problem is that it's completely disproportionate. It's why user-initiated poll submissions and feedback comments are usually absurdly skewed and not-correlating with reality - the people voting often have a vested interest motivating their actions), and every bit of common sense says that they aren't alone.
Once this approach has been mastered at Reddit, move on the larger meme sites like Digg - a couple of thousand votes on Digg is all it takes. Hire some botnet authors if need be.
(Note: This isn't intended to be a "how to", but this issue has bothered me for a while. Before hearing about the Giyus site mentioned above, I had already considered making an opinion swarming coordination web app, allowing groups to administer and privately coordinate opinion bombing runs. My goal was to highlight a potential problem, rather than enabling this sort of activity)