Dennis Forbes on Software and Technology   Subscribe to RSS


About the Author
Dennis Forbes Dennis Forbes is a Toronto-based software architect. While focused primarily on the .NET and SQL Server worlds, Dennis frequently ventures outside of this comfort zone into game development and image processing. He has been published in several industry magazines, has been quoted in the Wall Street Journal and has been interviewed by NPR.

He is a vice president and lead software architect at an innovative New York City hedge fund back-office services firm.

Dennis has been working on solutions for the financial, telecommunications, and power generation markets for over 15 years.




The Feed Bag

 
Friday, March 31 2006

Yesterday was quite a traffic day here on yafla.

After seeing a continuous low-level amount of interest coming from Reddit.com, continuing from Wednesday afternoon, mid-yesterday I happened to check the stats to find it reporting ~3500 simultaneous visitors (of course they weren't actually simultaneous GETs, but rather were simultaneous from the perspective of sessions). Initially I presumed that it was a software defect, but I quickly discovered it was real, and was due to the domain name entry appearing on Digg's front page. Along with a number of other great sites linking in, later in the day add to that some Slashdotting as another entry was apparently referenced in a story there.

Many impressive sites feeding in thousands upon thousands of visitors an hour. All told, from when the interest really kicked off early in the afternoon some 36,000 visitors came through by midnight, browsing well over 100,000 pages. This high level of traffic has continued through today.

I want to give my host a breather (the excellent ISQSolutions), so I'm going to hold off publishing the follow-up to the domain name entry (where I include stats such as dictionary values, phrase variations, etc) for a day or two to let the traffic settle a bit.

Through it all the site never failed to serve up pages (during the height of it the server continued to serve pages virtually instantly), courtesy of the fact that I publish these pages rendered into static form. Not only does this avoid the unnecessary overhead of script interpreting or database access, it also allows IIS 6 to kernel-cache the pages, allowing it to serve cached pages without even leaving kernel mode.

I should also say that Digg's influence is vastly greater than I postulated previously. I've had several pages as the primary focus of Slashdot stories before, and they didn't yield the simultaneous influx that a Digg front page did.

  Personal 
Wednesday, March 29 2006

The Search For A Domain Name

I recently had a need for a mid-sized amount of real-world data, which I required for testing purposes on low-end hardware (testing and demonstrating some of the new functionality of SQL Server 2005). I wanted something that wasn't confidential, which excluded the easy choice of using business data, and I refrain from using artificial data. Around the same time I happened across the requisition process for the .COM/.NET and .EDU TLD zones, so I made a request for access.

Soon enough I had the 3.5GB of .COM domain names, along with 650MB of .NET, loaded into the database (although for all results in this entry I only included the .COM TLD, for the data as of 2pm on March 28th, 2006. I'll analyze the other ones at a future date). It was a great foundation for a lot of tests and demonstrations, and served my original goal admirably. I didn't stop there, however; Curiousity led me to do some basic analysis to see what sorts of domain names are registered, and how saturated the registry really is.

Note that these are the Verisign distributed zone files, and do not include entries that have no nameservers configured, or which are in a hold state. While those comprise a very small minority of domain names, it does skew the results a bit. To improve accuracy when the sample set is small, for some of the tests I have validated the positives using the WHOIS infrastructure (for instance the domain file had several two letter sequences as being "available", and a dozen three letter sequences. All of them were the result of a hold state, or no nameservers configured). For aggregate results where it was inapplicable, I've filtered international domain names (IDN) from the results (prefaced with xn--).

You've thought up a brilliant idea for a new Web 2.0, AJAX-enabled web app, or you're about to release a thus-far-unnamed killer software app. Now you just need to find the perfect domain name for it to live at (and, in true new-economy fashion, you'll base your corporate name upon whatever available domain name you find... PILLAGEANDPLUNDR Corporation).

You pull up GoDaddy and start punching in clever names, along with their many variations, only to find that they're all seemingly taken.

"This can't be!" you cry. "Has every possibility already been registered?"

Given that there are approximately 50 million .COM domains registered, it is indeed true that the low-hanging fruit domain names are overwhelming taken, and your chances of lucking upon an unnoticed available three-letter acronym (TLA) are close to zero, and your only recourse would be to haggle with domain speculators.

What About Acronyms?

If you want one of the 676 possible two-letter sequences, for instance for an acronym or abbreviation, you're out of luck: They're all taken. Even allowing for digits, giving 1296 combinations, again every single variation is taken.

Of course, that's ignoring the fact that .COM registrars now mandate a 3-character minimum length, so it wouldn't be an option anyways.

Of the 17,576 possible three-letter sequences, again every single one is already taken. Adding digits to the mix (note that I'm intentionally ignoring obtuse dashes for such short domain names, though technically they are legal from the second character onwards), giving 46,656 permutations, yields a larger number of garbage domain entries (either REGISTRAR-LOCKED, REDEMPTIONPERIOD, or with no nameservers), giving a false hope of 228 seemingly open domains, yet they aren't actually available.

If you're dying to acquire great domains like 8VZ.com or Q6X.com, they'll free up within a month, though it seems evident that there are swaths of domain speculators acquiring every variant when they come available, so they won't go without a fight.

Stepping up to four letter sequences, choosing among the 456,976 combinations, yields a vastly greater availability -- perhaps the set is a bit too large for domain speculators and their unlikely success with random sequences -- with 97,786 showing as open. A quick check verifies that most are legitimately available. "Choice" domains, such as AGJV.com, EIYK.com, GZVW.com, and QFEV.com. Adding digits into the mix and there are a massive 1.16 million open domains, so long as you're looking for something like 7RG8.com, or U3JZ.com. Choose one and then manufacture a ridiculous backronym to explain it.

Going to 5-letter sequences (yet another five-letter acronym? YAFLA?), and of course the possibilities are rich, again presuming that you're willing to accept an arbitrary sequence of letters and/or digits, creating a backronym to match. Using just letters you have a rich 11,881,376 possibilities, of which approximately 11,015,028 are unclaimed.

How Long Are Most Domains?

Of course many of the registered domains are seldom, if ever, visited, with a huge percentage having nothing more than a parked page (users pay domain registrars to put up ads for themselves). Thus, analyzing the domain database without taking into account popularity/traffic is of limited value, but it does provide for a bit of entertainment.

As mentioned, 100% of 2 and 3 letter domain names are taken, but it starts to free up as the number of possibilities expodes, all the way up to 63-character domain names. The most popular registered domain name length is actually 11 characters long, tailing off from there.

The fun doesn't end at 31 characters, however. There are 253,000+ non-IDN domains that are 32 characters or longer, including 538 that are 63 characters long.

These include such superlative domains as ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ.com, WEBWEBWEBWEBWEBWEBWEBWEBWEBWEBWEBWEBWEBWEBWEBWEBWEBWEBWEBWEBWEB.com, and DIDYOUKNOWTHATYOUCANONLYHAVESIXTY-THREECHARACTERSINADOMAIN-NAME.com.

What About Names?

The US Census Bureau has some handy common name files available on their site, so I thought I'd see how one's luck would be trying to register their own name(s).

If you're looking for a masculine domain name, you'll be disheartened to learn that of the 1219 male names listed by the US Census Bureau, every single one is registered. If you're looking for something feminine, you're in luck: As I type this, of the 2841 female names listed by the Census, you can soon grab the lucrative recently expired Erlinda.com, or the sitting in purgatory Shanita.com, though both are technically currently taken.

On the family name front, 100% of the top 10,000 family names are registered.

Cross joining the top 300 male names with the top 300 family names finds that ~10,112 of the 90,000 possibilities aren't registered, to the benefit of anyone named Antonio Hughes and Lawrence Torres out there! Similarly, cross joining the top 300 female names with the top 300 family names finds that ~14,103 possibilities are unclaimed.

Domain Name Love

On the love front, 1958 (68.9%) of the 2841 possible 'ILOVE'-prefixed female names (using the census set of names) sit unclaimed, which is surprizing, as only 665 (54.5%) of 1219 'ILOVE'-prefixed male names remain available.

Continuing down that path, the seedier side of the internet is hardly a secret, and it's evident in the DNS database as well. 268,971 domains contain the sequence SEX (11,333 of them also containing the sequence FREE), while 143,683 domains contain the sequence LOVE.

Other Tidbits

The most common letter to start a domain is S, with relatively few domains starting with Q, X, Y or Z.

While the most common digit to start a domain is, unsurprizingly, 1.

Every successful company has remoras and haters, so it was interesting to look at the number of suffixed alternatives for some well-known domains. While some of these are actually owned by the root domain owner, most are hanger-ons and critics.

Samples include GOOGLE-AMERICA, GOOGLE-BUDDY, MICROSOFT-EBOOKS, SLASHDOTREVIEW, SLASHDOTSLASH, and YAHOO2007.

Conclusion

Hopefully this was a bit entertaining, and maybe even informative. I'm doing a much more intriguing, large-scale analysis (again, it's a nice opportunity to demonstrate some of the new SQL Server 2005 functionality) that I'll publish soon, but these were the low-hanging fruit.

[Also see Domain Name Analysis - More Fascinating But Entirely Useless Charts]

Monday, March 27 2006

[The static location of this piece can be found here]

Roots

While I live and work in the Greater Toronto Area -- in beautiful Burlington, a town that we love and have adopted as our home -- my origins were in the humble, working-class town of St. Thomas, Ontario.

With a population of 33,000, St. Thomas' claim to fame, oddly, remains that it was the place where Jumbo the elephant was killed by a passing train in 1885. I remember being a kid in the town when they decided to further celebrate this infamy. After much debating, they ordered a statue at the grand cost of $50,000, holding a parade to celebrate its arrival.

jumbo

For years the speculation raged about which high school football team would paint the elephant pink first, though I can't recall it ever actually happening. Strangely, the elephant's rear is pointed at one of the main roads coming into town.

As St. Thomas is located just a couple of hours down the highway from here (15 minutes South of London, which is another town that my wife and I lived in for several years. Thus far we've remained Ontarians, although Calgary has oft enticed us), we still visit regularly to see friends and family, and did so this weekend to celebrate my son's first birthday. Thanks for hosting it, Michelle. Hopefully you don't find green-icing anywhere unexpected.

Visiting St. Thomas

I took a couple of pictures of St. Thomas, usually while one or both of my children slept in the backseat. These were generally taken out of the car window, or with my son in my arms, so manage expectations accordingly. Most were taken very early on Sunday morning, after my son decided to wake up at 5:30am, and then promptly fell asleep the moment we left for a coffee run.

downtown

The relatively early hour explains the dearth of activity in the city (after a summertime all-night game of Empire during my mid-teens, my friends and I would hop on our bikes and ride down the center of the main street right after sunrise. It was surreal having no other human anywhere in sight).

IMG_4652IMG_4650IMG_4610IMG_4612IMG_4615IMG_4617

A Great Stomping Grounds

Growing up in St. Thomas, at least in the pre-teenage years, was fantastic, though I didn't appreciate it as much then: The city hosts numerous natural areas, most with significant elevation features. One great area was, and probably still is, affectionately known as "suicide" because of its narrow, ultra-steep trails, bordered by countless head-cracking trees. Nothing matched the exhileration of wailing down one on an old, brakeless BMX bike, relying on the perfectly applied use of one's foot in the front wheel forks for any stopping power...hoping to avoid locking the wheel and catapulting headfirst through the air. Another youth delight was courtesy of the railroad history of the town: an extensive networks of train tracks and train bridges crisscrossing the town. These railroad networks were the sneaker highway, the fossil searching grounds, and playground. The bridges were the source of many unintentional Stand By Me re-enactments (or rather pre-enactments).

IMG_4622

For an adventurous, energetic child with more liberal parents, St. Thomas couldn't be beat.

For the child of "fearing-their-child-playing-in-an-inaccessible-place-called-suicide" parents, things were still pretty good: The main city parks, Waterworks and Pinafore, are both superlative public areas.

waterworks_reservoirpinafore_odd_Statuepinafore_areawaterworksIMG_4631

Waterworks features a swimming and wading pool, the natural fun of Kettle Creek (I marvel that I used to swim in that muddy, slug and snapping-turtle filled river), and a "waterfall" (a very small dam), and was a frequent lunchtime playground when I went to the nearby Lockes Public School. Pinafore is a large, open-area park with a very small animal exhibit (Tito the deer and friends, birds, some visiting swams, ducks, and so on), an extensive playground, bandshells, ball diamond, and picnicking areas, and a reservoir and small stream.

Many fond childhood memories involved family get-togethers in the parks, or picking up a bucket of KFC and heading down to Waterworks for some swimming. During the summer the city works department does an amazing job with the flowers in both parks, and they're spectacular shows. It's something you don't appreciate when you're young, but when I see it in the summer now I'm amazed.

A Tale of an Industrial Town

The general feel of the core of St. Thomas is of a somewhat decayed, almost US rust-belt type town, which makes sense given that it saw the same boom-bust cycle of those US towns: After the geographical happenchance boom of railway faded, the city relied heavily upon the automotive industry (the vulnerable St. Thomas Ford Assembly Plant sits nearby, and many supply firms were located in St. Thomas). This reliance on one of the most vulnerable sectors of the economy meant that every economic cycle was greatly amplified.

The smallest economic blip meant automotive industry shutdowns that reverberated through every industry in the town. During every downturn St. Thomas usually hosts an unemployment rate far above the average.

The general demeaner of the town was often of paranoia and fear mongering, with the long predicted closing of the Ford plant being the primary worry for years on end. The general attitude was often one of being impotent bit-players, awaiting the moves of some far off business executive to cast one's life asunder.

coffeesIMG_4624waterworks_varietywellington_convenience

Throughout the town, gorgeous old turn-of-the-20th-century brick houses have been slashed into numerous apartments, and/or are falling apart due to neglect. Much of the city's retail space sits empty, or full of very low-end retailers (e.g. dollar stores). Other areas of retail have seen a resurgence (such as the building of a new retail complex featuring a new Walmart, Canadian Tire, among many others).

Resurgence as a Suburb

In other parts of the town, old houses are starting to be reclaimed and rejuvenated, and new subdivisions are sprouting up. St. Thomas is becoming a suburb of London.

Amazing to consider this -- when I was a kid in this town, London was some far off mythical place that one only ventured to under extreme conditions. Now it's rightly a short commute, and some wise Londoners are getting a steal for some structurally sound, beautiful old houses in St. Thomas' core.

Nonetheless, St. Thomas' industrial roots still show through. The town has far more big-old-pickups per capita than anywhere -- often featuring the name of the driver and his S.O. stenciled on the doors -- and smoking is far more prevalent than you find in the GTA. Owing to its automotive history, the town has an extraordinarily high prevelence of so-called "domestic" makes, with Hondas and Toyotas being a very rare sight indeed (something I never noticed until we invited a friend to a get together in St. Thomas. He asked if he should worry about his Accord getting vandalized. The concept seemed alien to me, but looking around I was amazed to realize that probably 99% of cars on the road were of the "big 3". Here in Burlington, I doubt the big 3 account for more than 50% of cars on the road).

During a great breakfast at a local restaurant, I overheard a conversation that brought back flashbacks of why I was so desperate to leave the town as a teenager: A group was debating who was a "Chevy Person" and who was a "Ford Person". When I was a youth, such inane conversations, and bizarre corporate loyalties, accounted for most overheard conversation, and sales were brisk of "Calvin urinating on the opposing camps corporate allegiance". This was the St. Thomas version of crips and bloods.

The Clarity of Distance (Time and Space)

The dearth of real opportunity, coupled with too much inane chatter (such as car brand allegiances) had me fleeing St. Thomas at the first opportunity, but looking at it now I've grown a fondness for it, warts and all.

  Personal 
Friday, March 24 2006

After announcing some more delays with Vista, and then a delay with Office 2007, and then a critical hole in IE, and then a restructuring of the entire Windows division, and then some negative press about Vista's usability, Microsoft is reeling right now, and things are looking down. 2006 isn't the year of Microsoft.

As much as I appreciate and understand that they're working on projects of a scope that dwarfs the largest projects most of us will ever touch, one thing that amazes me is seeing people continually defending Microsoft, saying "Well isn't it better that they hold it until they get it right?". Sure, but you're talking about the best choice at the end of a lot of terrible choices. Vista has been a disaster, and surely after this debacle Microsoft will take a cue from Apple and learn how to stream out incremental releases, underpromising and overdelivering.

About a year back, a Microsoft rep, as some sort of standard questionnaire, asked me what I thought the greatest problem with Microsoft was. My reply was that Microsoft ties too many of their products together, in a dangerous cross-relationship where each development group is riskily trying to design for the other, and each is critically endangered when there is a fault or delay in the other (e.g. rather than the OS team making the best OS, and the .NET group making the best application layer platform, and the video group making the best video group...each is trying to cater to the needs of the other during the design stage. It sounds great in theory, but it SELDOM works in reality).

Give me a call, Bill. I'll help you set things straight.

  Personal 
Friday, March 24 2006

I'm awaiting the availability of an updated .NET/.COM zone file for performance demonstration purposes (e.g. many of the samples for part III use the whole of the .COM/.NET DNS directories as performance samples). This is public data that people can replicate themselves, rather than confidential internal or client data, or manufactured data, so I thought it a good foundation.

I hope to finish up this series in the next couple of days.

  SQL 
Friday, March 24 2006

Like most people with a website, I regularly check the stats to see how things are going: How many people visited today? Where did they come from? How many times have the search engines sent someone my way? These are metrics that I use to know if I'm hitting internal goals, and allow me to alter plans when things head in the wrong direction (If I lose readers, I'll just have to make up some story about Google buying Digg and then duking it out with Microsoft Reddit Live! That sort of thing seems to play very well these days. Don't say I didn't warn you! Of course there's a bit of hypocrisy in the fact that I'm largely speculating about search algorithms, with very limited facts, while criticizing acquisition-of-the-hour rumors).

While the number of visitors has a fairly constant floor, the daily count ceiling can vary wildly if I've posted something new, if someone posted it to reddit or Digg or Slashdot, based upon how many people added things to their http://del.icio.us bookmarks, and so on.

A Wednesday might have 2200 visitors one week, while it sees 15,000 visitors the next.

Search engine referrals, in contrast, are usually fairly constant, with a generally predictable number being sent over, following a recurring weekly curve: Monday = X, Tuesday = X*1.1, Wednesday = X*1.15, Thursday = X*0.75, Friday = X*0.65, Saturday = X*0.4, Sunday = X*0.3. X as been slowly edging up as I add content, and as more inbound links appear and thus PageRank and similar rankings improve.

This week it hasn't been quite as predictable on the search engine front.

After a Sunday drop in search engine referrals, from Google in particular, on Monday Google referrals jumped 50% over the week before. On Tuesday they again jumped 50%. Then on Wednesday they dropped 20% under the mark set the preceding week. Again it came in 20% below on Thursday.

I would write this off as nothing more than normal fluctuations -- maybe users just weren't searching for the sort of content covered on here on Wednesday and Thursday, so the referrals dropped off -- but for the fact that Monday and Tuesday coincidentally also saw a large influx of visitors from Reddit, Digg, Delicious, popurl, and a few other meme sites, quadrupling the normal traffic. Of course these new links were far too fresh to affect the PageRank, so by traditional analysis shouldn't affect the search referrals at all.

This got me thinking, in my normal conspiracy theory way: What if Google has started tying site visits, metered by the Google toolbar (which sends back the sites you're visiting if you have pagerank display on), and has begun using the current values to determine search results?

They could tune this in such a way that a site has to get a certain percentage of non-search referred visitors for each search referral, otherwise the search result is downgraded. The benefit, of course, is that search spam sites that only see visitors courtesy of the search engines would be quickly punished. "Valuable" content that is seeing signficant non-search related traffic would be promoted.

Just some food for thought. I have no proof of this, but I've always felt that there would come a time that their web visit stats would start to influence the search results.

Wednesday, March 22 2006

For, I believe, the 7th or 8th time, something from here has sat on the front page of Reddit for over 12 hours.

I find this amazing: I don't cheaply pander to the Reddit crowd (though some would say this front page hitting entry did, that was not my intention to), and I don't write entries specifically targeting what I think would do well there.

I am honoured.

Of course if I wrote entries targeting what I thought would do well on Reddit, they'd probably do terribly, quickly getting moderated into the sub-zero range. Such is the nature of presumptions like that.

  Personal 
Earlier EntriesLater Entries

Dennis Forbes