Some recently published statements regarding the Canadian Firearms Centre's online database, made by a former webmaster, have rightly earned a lot of attention: Mr. Hicks, an Orillia-area computer consultant, claims that he has identified several prior -- and possibly still remaining -- security gaps in the firearms registry. Gaps that allow(ed) very sensitive information to be queried by anyone with a home computer and an internet connection.
If this is true, it betrays tremendous negligence in the creation and maintenance of this system, and while a lot of the attention is coming from the politically motivated, using it to further a pre-existing agenda, it doesn't diminish the seriousness of this event occurring in the first place.
No specifics are given, however the likely vulnerability relates to SQL-injection vulnerabilities.
More importantly, do people still call themselves "webmasters"? Is that really still a title?
While Mr.Hicks refers to the system as a "$15 million dollar system" in the linked article, its history is convoluted, and much more expensive (perhaps a digit was lost in editing). After purportedly giving EDS $151 million dollars to make a working system, the government gave up and turned it over to a consortium of CGI, BDP, and Resolve Corp, giving them an estimated $100 million dollars thus far.
This is to create a system to register 1.9 million gun owners with a combined seven million guns.
Accounting for extensive security and auditing -- of course mandatory for a system of this nature -- eforms, correspondence, web services, feeds for police stations, integration with legacy systems, web reporting and secure access, and so on, it still doesn't strike me as an overly complex project. The scope and capacity of data I've heard could be handled on a modern four-way SQL Server box with a half decent SAN. Add in a cluster backup, and you're still talking about less than $200,000 (with all software licenses). The actual custom software itself should be straightforward, given that data entry, data reporting, and data security are some of the most known, proven design elements in this business.
This is largely wizard-type stuff, for which they've purportedly paid $251 million thus far.
If an article in the National Post today ("A one-stop shop for gun thieves") is to be believed, the system crashed 90 times on the first day of testing, requiring their hardware to be completely reset 30 times that day -- an event that is unseen with the reliable platform stack we have nowadays. They called off the test and sent it back to development, sending all of the expensively flown-in testers back home.
Of course we don't know all of the obscure details of this project, and it is a certainty that trying to build a system for a rapidly changing government, with enormous changes in the root requirements, is more difficult than an average project, but I find it hard to fathom that it's $251 million dollars different. I do appreciate that software developers often underestimate the tasks of other software developers and systems designers -- often with a foolish cowboy "I could do that on a weekend!" bravado -- but in this case I've designed and worked on systems of a similar scale, and I feel fairly confident in my assessment.
Given the very limited details that I've heard, I would have armchair estimated this as a less-than-$1 million dollar project, hardware in. I would never imagine that it would pass a quarter-of-a-billion dollars.
I've long been a Microsoft enthusiast, heartily embracing the platform and the development tools.
My first real professional development job was with Visual C++ Professional v1.0 (after years doing less professional work with tools like DJGPP) -- a product that came in a giant 50lb box full of huge reference manuals, along with a giant stack of floppies -- and my work and home life have predominately relied upon various incarnations of Windows throughout the years, from 3.11 to Windows Server 2003R2. I've personally pursued various certifications from Microsoft, and will be completing another hopefully in a few days. I've been developing in C, and then C++, and then .NET since the first beta, on the Microsoft platform, along with some deviant Win32-targeting object Pascal Delphi work, relying upon great products like SQL Server, or subsystems like MSMQ, ActiveDirectory, and DCOM, to build amazing solutions.
I've been branded a Microsoft astroturfer/paid-shill countless times on sites like Slashdot for speaking out against some rampant anti-Microsoft mistruths, and for defending some of Microsoft's actions (though I still haven't received a cheque from Microsoft for my volunteer advocacy...).
I've even written for Microsoft's premiere development magazine.
Yet I have zero personal interest in Windows Internet Explorer 7*, beyond professional observation. Perhaps it'll have some yet unannounced amazing new innovation when it's eventually released, but as it is it's nothing more than an also-ran, finally bringing functionality that competitors such as Firefox and Opera have had for years. Other functionality, such as the sandbox model IE will have on Vista -- which they've built for the inevitable exploits that will follow -- rely upon operating system shims that only Microsoft has the privilege of adding. Presumably this same functionality will exist for alternate software products as well, so there's no reason -- beyond the type that the Justice Department would take interest in -- that Firefox and Opera won't gain the ability to utilize the functionality.
If users are waiting with baited breath, living with their half-a-decade old Internet Explorer 6 in anxious anticipation of Microsoft finally putting some care into their browser, they need to seriously ask themselves why they haven't considered or evaluated the superior alternatives that are freely available. IT departments that simply coast along with whatever their Microsoft rep has decreed as acceptable need to ask themselves the same thing, and blanket decrees such as a banning of Firefox on corporate machines need credible justifications, and not just some baseless fear-mongering by a group that doesn't want the bother.
Internet Explorer wasn't always such a boring product. The period of greatest innovation with Internet Explorer happened in the IE 4 and IE 5 timeframe, when we gained functionality such as XML, XML data islands, the foundation of AJAX (if you had the luxury of only targeting IE 5+, you could build web apps in 1999 that rival the most "innovative" Web 2.0 sites today), implementing advanced CSS and DOM functionality simultaneous with, or ahead of, competitors. This was when the team seemed to have free reign, and whose primary motivation appear to be creating a great browser, rather than the oft claimed conspiracy of building Microsoft tie-in -- in fact the product was cross-platform, bringing a great browser to the Mac, for instance.
Of course, then they were trying to win the browser wars, and the result was the quick decimation of Netscape's marketshare. Microsoft's best minds rapidly created a killer web browser to kill a competitor in the web browser market, and there is no doubt that they technically succeeded, evolving their browser much more rapidly than the quagmired Netscape browser.
Even with the first-rate team working on what was the premiere browser, the market still was still very slow to adapt: Microsoft had so thoroughly intertwined the browser in the operating system that it became a potentially dangerous operation upgrading. It's for this reason that old version of Internet Explorer lived on long past their presumed expiration date, with IT departments hesitant to upgrade. This system interweave yielded some advantages, such as embedded browsers in divergent applications such as Quickbooks, yet it came at the cost of greatly reduced agility of the foundation. Compare this to a product like Firefox that exists largely as a software island, where uptake of new, feature-enhanced versions happens at an extremely rapid pace. Taking advantage of the new functionality in Opera 9 or Firefox 2(*2) would be no more risky, for most users, than upgrading their copy of WinTetris.
Microsoft won the browser war, and seeing how this new platform could actually undermine their own business, and reduce dependency on the Windows platform, the team was dispersed far and wide. All work on Internet Explorer, outside of emergency security fixes, was stopped. The internet world that had now come to rely largely upon the rapidly evolving Internet Explorer now saw absolutely no progress, while inside Microsoft they strategized how best to build Windows-specific technologies to pull developers and users back (such as XAML and one-touch deployment), tying them once against specifically to the Microsoft platform.
Five years+ on, the tide is slowly shifting, and Firefox is rapidly gaining marketshare, and the capable Opera browser continues to idle at a low level. Among sites catering to the IT/software development market, Firefox use is dominant. Public websites that demand Internet Explorer are quickly going extinct, and cast considerable doubt on the prowess of their creators.
Even if Internet Explorer 7 were a much more exciting product than it has proven thus far, I would still advocate against it.
We saw previously how Microsoft used the browser market only while it was in her interest, and then promptly abandoned its users when it wasn't, and there is no reason to think the same won't continue. Having users rush to Internet Explorer 7, killing interest (and thus the speed of development for) competitors won't do the web any good when Microsoft promptly stops development again, enticing you to dump this crazy web thing and embrace the next evolution of fat apps. Given that the browser is largely contrary to Microsoft's business interests, it seems an outcome that is inevitable.
Indeed, Internet Explorer 7 was originally only slated to come out for Longhorn (now Vista), as a sort of carrot to interest users in the otherwise boring upgrade, however the endless slips of Vista, coupled with rumors of Google entering the browser fray (which they have indirectly through some healthy financial support of the Mozilla Foundation), led them to revise their plans. Yet it still remains that some of the most valuable improvements of IE 7 will only be available if you upgrade to Vista (so if you're running IE7 on XP, you're running a sort of IE7-lite). Compare this to Firefox, where the exact same browser, and largely the same set of superlative extensions, runs on a huge range of operating systems, from obsolete to cutting edge: Firefox has no agenda to get you to upgrade your operating system, so such a differentiation doesn't exist, and you can take advantage of advanced cavas elements and svg right now.
Why You Shouldn't Care About Internet Explorer 7
*- Based upon the great success they had with the .NET marketing wave, Microsoft is now widely branding their products and technologies with the prefix "Windows", so instead of Microsoft Internet Explorer (MSIE), it's Windows Internet Explorer (WIE? WinIE?), or perhaps Microsoft Windows Internet Explorer. This is to try to get the unrecognized name "Windows" out in the marketplace.
*2 - Apart from Firefox extensions, which are becoming a bit of a problem with each new version of Firefox. The break rate of extensions is so high that it's creating the sort of resistance to change that used to happen with Internet Explorer. The Firefox team really needs to solidify their API, allowing new extensions to take advantage of newer interfaces without breaking the existing extensions.
When time affords, I've been looking over two widely hyped web betas - Riya and Ether.
Riya is a "next generation" Flickr online photo web app, enabled with facial and text recognition. Expectations about the facial and text recognition have hugely scaled back from the hysterical claims being made months ago (now they're giving long lists of caveats, warning that it's just getting started and that you should focus more on the photo album capabilities). Given that this is supposed to be the site's killer feature, it will be interesting to see where they go with that. I'm going to try it out with a load of test photos and see how it compares.
Ether is a rather interesting service that arranges calls through phone service providers and clients, charging a, ahem, "pimp fee" for the service. It's quite a clever idea, and can greatly simplify the infrastructure and billing requirements for small phone service providers. On the downside it requires callers to register with Ether, which is a requirement that will definitely reduce acceptance (if it was "976" style billing, where the billing automatically goes on the phone bill, I would imagine it would do better, though of course that isn't really possible globally).
I can't really see a huge market for this service, but they've built quite a nice web app for the system, and the phone infrastructure seems to work well.
Potentially it could be used for easily billing out phone support for those who follow the software development model of "release the software for free and then charge for support".
|
Dennis Forbes
1-888-MY-ETHER ext.
01384785
|
(Note: I don't actually expect anyone to call that, as telephone style services aren't really my forte. Perhaps if I got in the astrology industry it would have more utility).
On the topic of neat web services, and while thinking of Ether, there's another clever one I was pointed to recently - http://www.jajah.com/.
Like most people with a website, I regularly check the stats to see how things are going: How many people visited today? Where did they come from? How many times have the search engines sent someone my way? These are metrics that I use to know if I'm hitting internal goals, and allow me to alter plans when things head in the wrong direction (If I lose readers, I'll just have to make up some story about Google buying Digg and then duking it out with Microsoft Reddit Live! That sort of thing seems to play very well these days. Don't say I didn't warn you! Of course there's a bit of hypocrisy in the fact that I'm largely speculating about search algorithms, with very limited facts, while criticizing acquisition-of-the-hour rumors).
While the number of visitors has a fairly constant floor, the daily count ceiling can vary wildly if I've posted something new, if someone posted it to reddit or Digg or Slashdot, based upon how many people added things to their http://del.icio.us bookmarks, and so on.
A Wednesday might have 2200 visitors one week, while it sees 15,000 visitors the next.
Search engine referrals, in contrast, are usually fairly constant, with a generally predictable number being sent over, following a recurring weekly curve: Monday = X, Tuesday = X*1.1, Wednesday = X*1.15, Thursday = X*0.75, Friday = X*0.65, Saturday = X*0.4, Sunday = X*0.3. X as been slowly edging up as I add content, and as more inbound links appear and thus PageRank and similar rankings improve.
This week it hasn't been quite as predictable on the search engine front.
After a Sunday drop in search engine referrals, from Google in particular, on Monday Google referrals jumped 50% over the week before. On Tuesday they again jumped 50%. Then on Wednesday they dropped 20% under the mark set the preceding week. Again it came in 20% below on Thursday.
I would write this off as nothing more than normal fluctuations -- maybe users just weren't searching for the sort of content covered on here on Wednesday and Thursday, so the referrals dropped off -- but for the fact that Monday and Tuesday coincidentally also saw a large influx of visitors from Reddit, Digg, Delicious, popurl, and a few other meme sites, quadrupling the normal traffic. Of course these new links were far too fresh to affect the PageRank, so by traditional analysis shouldn't affect the search referrals at all.
This got me thinking, in my normal conspiracy theory way: What if Google has started tying site visits, metered by the Google toolbar (which sends back the sites you're visiting if you have pagerank display on), and has begun using the current values to determine search results?
They could tune this in such a way that a site has to get a certain percentage of non-search referred visitors for each search referral, otherwise the search result is downgraded. The benefit, of course, is that search spam sites that only see visitors courtesy of the search engines would be quickly punished. "Valuable" content that is seeing signficant non-search related traffic would be promoted.
Just some food for thought. I have no proof of this, but I've always felt that there would come a time that their web visit stats would start to influence the search results.
[The static location of this piece can be found here]
You've thought up a brilliant idea for a new Web 2.0, AJAX-enabled web app, or you're about to release a thus-far-unnamed killer software app. Now you just need to find the perfect domain name for it to live at (and, in true new-economy fashion, you'll base your corporate name upon whatever available domain name you find... PILLAGEANDPLUNDR Corporation).
You pull up GoDaddy and start punching in clever names, along with their many variations, only to find that they're all seemingly taken.
"This can't be!" you cry. "Has every possibility already been registered?"
Given that there are approximately 50 million .COM domains registered, it is indeed true that the low-hanging fruit domain names are overwhelming taken, and your chances of lucking upon an unnoticed available three-letter acronym (TLA) are close to zero, and your only recourse would be to haggle with domain speculators.
If you want one of the 676 possible two-letter sequences, for instance for an acronym or abbreviation, you're out of luck: They're all taken. Even allowing for digits, giving 1296 combinations, again every single variation is taken.
Of course, that's ignoring the fact that .COM registrars now mandate a 3-character minimum length, so it wouldn't be an option anyways.
Of the 17,576 possible three-letter sequences, again every single one is already taken. Adding digits to the mix (note that I'm intentionally ignoring obtuse dashes for such short domain names, though technically they are legal from the second character onwards), giving 46,656 permutations, yields a larger number of garbage domain entries (either REGISTRAR-LOCKED, REDEMPTIONPERIOD, or with no nameservers), giving a false hope of 228 seemingly open domains, yet they aren't actually available.
If you're dying to acquire great domains like 8VZ.com or Q6X.com, they'll free up within a month, though it seems evident that there are swaths of domain speculators acquiring every variant when they come available, so they won't go without a fight.
Stepping up to four letter sequences, choosing among the 456,976 combinations, yields a vastly greater availability -- perhaps the set is a bit too large for domain speculators and their unlikely success with random sequences -- with 97,786 showing as open. A quick check verifies that most are legitimately available. "Choice" domains, such as AGJV.com, EIYK.com, GZVW.com, and QFEV.com. Adding digits into the mix and there are a massive 1.16 million open domains, so long as you're looking for something like 7RG8.com, or U3JZ.com. Choose one and then manufacture a ridiculous backronym to explain it.
Going to 5-letter sequences (yet another five-letter acronym? YAFLA?), and of course the possibilities are rich, again presuming that you're willing to accept an arbitrary sequence of letters and/or digits, creating a backronym to match. Using just letters you have a rich 11,881,376 possibilities, of which approximately 11,015,028 are unclaimed.

Of course many of the registered domains are seldom, if ever, visited, with a huge percentage having nothing more than a parked page (users pay domain registrars to put up ads for themselves). Thus, analyzing the domain database without taking into account popularity/traffic is of limited value, but it does provide for a bit of entertainment.
As mentioned, 100% of 2 and 3 letter domain names are taken, but it starts to free up as the number of possibilities expodes, all the way up to 63-character domain names. The most popular registered domain name length is actually 11 characters long, tailing off from there.

The fun doesn't end at 31 characters, however. There are 253,000+ non-IDN domains that are 32 characters or longer, including 538 that are 63 characters long.

These include such superlative domains as ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ.com, WEBWEBWEBWEBWEBWEBWEBWEBWEBWEBWEBWEBWEBWEBWEBWEBWEBWEBWEBWEBWEB.com, and DIDYOUKNOWTHATYOUCANONLYHAVESIXTY-THREECHARACTERSINADOMAIN-NAME.com.
The US Census Bureau has some handy common name files available on their site, so I thought I'd see how one's luck would be trying to register their own name(s).
If you're looking for a masculine domain name, you'll be disheartened to learn that of the 1219 male names listed by the US Census Bureau, every single one is registered. If you're looking for something feminine, you're in luck: As I type this, of the 2841 female names listed by the Census, you can soon grab the lucrative recently expired Erlinda.com, or the sitting in purgatory Shanita.com, though both are technically currently taken.
On the family name front, 100% of the top 10,000 family names are registered.

Cross joining the top 300 male names with the top 300 family names finds that ~10,112 of the 90,000 possibilities aren't registered, to the benefit of anyone named Antonio Hughes and Lawrence Torres out there! Similarly, cross joining the top 300 female names with the top 300 family names finds that ~14,103 possibilities are unclaimed.
On the love front, 1958 (68.9%) of the 2841 possible 'ILOVE'-prefixed female names (using the census set of names) sit unclaimed, which is surprizing, as only 665 (54.5%) of 1219 'ILOVE'-prefixed male names remain available.

Continuing down that path, the seedier side of the internet is hardly a secret, and it's evident in the DNS database as well. 268,971 domains contain the sequence SEX (11,333 of them also containing the sequence FREE), while 143,683 domains contain the sequence LOVE.

The most common letter to start a domain is S, with relatively few domains starting with Q, X, Y or Z.

While the most common digit to start a domain is, unsurprizingly, 1.

Every successful company has remoras and haters, so it was interesting to look at the number of suffixed alternatives for some well-known domains. While some of these are actually owned by the root domain owner, most are hanger-ons and critics.

Samples include GOOGLE-AMERICA, GOOGLE-BUDDY, MICROSOFT-EBOOKS, SLASHDOTREVIEW, SLASHDOTSLASH, and YAHOO2007.
Hopefully this was a bit entertaining, and maybe even informative. I'm doing a much more intriguing, large-scale analysis (again, it's a nice opportunity to demonstrate some of the new SQL Server 2005 functionality) that I'll publish soon, but these were the low-hanging fruit.
[Also see Domain Name Analysis - More Fascinating But Entirely Useless Charts]
I've been extremely busy professionally over the past week, so I apologize for the lack of content. The quiet is also admittedly because it's hard to follow-up the domain name entries, given the extraordinary level of interest they received.
Apart from ~30,000+ visitors to those entries, per day, continuing for about a week (and still tapering off), I was also phone interviewed on National Public Radio (broadcast throughout the US), quoted in an ezine, translated to other languages, linked by several hundred other sites (including the blogs of several people in this industry who I've admired for many years), and parts of the entry is going to be published in a reputable magazine.
That interest completely shocked me.
I obtained the domain name database for purely functional reasons, and threw up the entry of observations purely because I found a couple of the stats interesting (I love digging into data and finding interesting correlations and insights. I imagine how interesting it would be to delve in some of the large datasets like grocery store databases: Who doesn't look in the cart in line ahead of them, drawing conclusions about the personality and lifestyle of the individual based upon their purchases? Imagine all of the fun observations one could derive from the entire database of purchases).
At most I thought the regulars would find it interesting, and was shocked to see the level of traffic. Apart from all of the wonderful comments I've received, and publicity for my consulting/software development business, the benefit to PageRank has been tremendous, and search engine referrals are through the roof.
In any case, I have several entries almost ready for publication, so content should ramp up again shortly.
Have a fantastic day and week ahead.
Data security has been on my mind lately, mostly after learning that approximately 700,000 laptops are stolen in the US per year. Add the armies of desktops stolen, the backup tapes lost, and the system compromises that occur, and the situation starts to look pretty grim for data security.
How secure is your data?
If someone stole your desktop, or snatched your laptop from under you at a coffee shop, what confidential information could they gain?
While most thieves aren't of the capacity or motivation to crack the syskey or circumvent NTFS permissions (which is as easy as booting up with a knoppix disc. File ACLs only matter if the expected host operating system is in charge), your response should be to assume that they do, and that they are now reading all of your documents, looking at all of your shortcuts and form entry values, browsing your Outlook notes of account numbers and passwords, and are playing with your tax returns.
The real-world cost of such a compromise can be extraordinary. Losing an expensive piece of equipment can be annoying, but it pales compared to the wholesale loss of data privacy.
Do you use EFS (more information here)? Do you have a Data Recovery key with the private key stored offline in a protected location? Do you know what syskey does? Are you aware of the upcoming Secure Startup (which basically is whole volume encryption)?
Are you comfortable enough with your procedures that the physical loss of a computer to theft would be nothing more than a financial expense and setup hassle, with marginal or no data exposure?