While I've used Linux for years in the virtual machine space, primarily as a "native" location for many of the UNIX-style command line tools (even if I'm using them against an SMB share on the network, they're still extremely helpful), it has never been a primary operating system on any of my PCs outside of novelty "let's see what stage they're at" usage. Where I did use a Unix variant, it was always FreeBSD.
I've decided to give it another go, this time in an "Internet Appliance" of sorts.
This opportunity arose when a hard drive in an old eMachines Athlon 2400+ 512MB WXGA laptop I had kicking around died. It was running warmer and warmer until finally it stopped running altogether. Now it's just lounging on a park bench.
Given that the laptop saw
limited use -- mostly for my daughter to play PeepAndTheBigWideWorld
games online -- this didn't cause too many tears, and it gives me
an opportunity to pursue the internet
appliance desire (perhaps in the kitchen. Something that is
zero maintenance or security worry, hopefully low power, and that
allows for rich web browsing for the family and guests) that I've
been thinking about for some time. What I'm considering is running
the laptop diskless (meaning no hard drive or floppy drive, though
there will be an optical drive -- unfortunately this particular
laptop - an M5312 - can't boot off of the USB key, though I'm
keeping my eye out for possible bios options. Note that I do
not want a hard drive, so I'm not looking for replacement
options for it. This is a fun effort, and as I already have another
working modern laptop, I just think a diskless device, minus the
noise, heat and power consumption of the hard drive, would be
pretty cool).
After some futile efforts with Knoppix, my initiatives thwarted by the fact that it refused to work with the Broadcom wireless (either with the native driver or with the ndiswrapper driver), I switched over to Ubuntu's "LiveCD" desktop version and it works amazingly (though the wireless worked much better with ndiswrapper and the Windows Broadcom bcm43xx driver). With little hassle the machine is booted from the CD, the wireless is operating (with WPA and everything), Firefox is updated with Shockwave and various updates, and it's brilliantly usable with complete mobility throughout the home, low power, low noise, and limited heat. I could duct-tape it to a wall if I wanted to.
The only problem is that it's a temporal state, and once I shut down (or there's a power outage), I need to start from the delta of the CD image once again. The latest release of Ubuntu does have a rather sketchy persistent option where, with a bootup option (which is lame), it basically loads everything from the CD, overlays the contents of a USB key image, so you're still starting from the base and then consuming some of the ghost filesystem with the delta. This is on top of the fact that the persistent functionality uses the USB key almost like a R/W filesystem, purportedly constantly writing changes, which would lead to a very short key lifespan. Also I really don't want to save continually changing state, but rather want to choose a perfectly configured point and solidfy it, with each reboot starting exactly there.
What I'd really like to do is to configure the machine and somehow persist that onto a new boot CD, such that the starting state is exactly what I want (realizing that in the future, I may need to make newly updated CDs). Indeed, it is possible to make your own Ubuntu bootup CDs, and I've successfully done so to some success. However I need to get the changes from this live, no-persistent storage laptop onto the image for burning onto CD (many of the configuration steps are making changes unknown to me, so I can't just modify a couple of /etc .conf files). So I've got a Ubuntu virtual session running*, and I've extracted the boot-image filesystem, but it's the process of getting the laptop image over to the image for burning that I'm unsure of. I've tried rsync to dismal failure.
If anyone has any ideas, I'd greatly welcome them. The laptop has a 1GB USB key which it can access, and working wired and wireless connectivity. On the other end is a Ubuntu session with plenty of RAM and storage, ready with a decompressed image to be prepared for burning into the ultimate, personalized LiveCD.
* - the process described to make a custom Ubuntu boot CD -- such as creating a filesystem within a file, and then copying over from the compressed filesystem, completely borked VirtualPC...multiple times. Not to mention that performance in general was atrocious. I then tried under VMWare Server (the free product) and it worked absolutely perfectly, and the performance was enormously faster.
Normally I'm a fan of VirtualPC, but from here on in I'll be doing all Linux work under VMware.
Why is it that "90% done" (and
its partner in crime - the ubiquitous "almost done!") is
the progress report for virtually any project, over virtually all
of its life-cycle?
Why has 90% become the fictional number of choice? Why not the more conservative 80%, or the bolder 95%? Given that it usually has little correlation with reality, they're just as real.
Projects should be reported as 87% done. Even when there's the ominous "we'll solve that problem when we get to it" task maliciously eyeballing you from later in the project plan, or the "it doesn't work and we have no clue why?" runtme reality, still say 87% with confidence and pride.
Joel Spolsky, the well-known blogger and ISV owner, kicked up quite a storm recently with his piece entitled Language Wars [for those following the `debate', yes, I'm late to the party on this. I make it a general standard to avoid responding to blogs on here -- the whole blog thing is entirely too recursive -- but some recent reactions to his piece pushed me to post].
The article leads off with some pragmatic wisdom, advising enterprise-y, low-risk type shops to use well-known and well-proven technology stacks -- solid advice that's hard to argue with -- yet he then ends the piece with a comment about an in-house, next-generation, super-duper language being used to develop FogCreek's premiere product, FogBugz.
The discord was so great that most readers presumed that the Wasabi thing was a joke, or alternately that the rest of the article was the joke (which would have been an awesome revelation). Much confusion ensued, to the point that Joel had to put up a post clarifying that he was actually serious about the Wasabi thing
Aside from the seeming hypocrisy, what really instantiated some JoelCritic<T> instances (via the BlogCriticFactory) were Joel's comments about Ruby, where he seemingly indicated that it wasn't ready for prime time.
...but for Serious Business Stuff you really must recognize that there just isn't a lot of experience in the world building big mission critical web systems in Ruby on Rails, and I'm really not sure that you won't hit scaling problems, or problems interfacing with some old legacy thingamabob, or problems finding programmers who can understand the code, or whatnot...
...I for one am scared of Ruby because (1) it displays a stunning antipathy towards Unicode and (2) it's known to be slow, so if you become The Next MySpace, you'll be buying 5 times as many boxes as the .NET guy down the hall.
I'm sure Joel anticipated the backlash. Perhaps it was even the motivation behind the posting: The resulting torrent of discussion brought quite a few visitors to his blog, and earned him a lot of inbound links, both of which have definitely helped with his new business ventures. No publicity is bad publicity, they say, especially if it's timed to coincide with the launch of a new job board (as an aside, Ruby, Wikipedia, OSX, Python, Lisp, and ERLang are all terrible! People with the letters J or P in their names are jerks!).
Ruby is still new enough, and with a small enough community, that many of its users double as evangelists -- think of the Amiga computer, the BeOS operating system, or any other contextually-superior alternative embraced by a small enough group that many feel an ego-intersection with the technology, motivated to defend and advocate it when the opportunity arises. Linux once had such an attack-dog core of rabid enthusiasts, though as the user base has grown, and it has become more pedestrian, you really have to target a Linux-niche (such as a little used distro) if you're aiming to stir up a hornet's nest.
That entire lead-up was just some context for the actual topic of this entry: So-called premature optimization.
A common response to Joel's complaint that Ruby is slow or resource inefficient is the frequently incanted declaration that such complaints are nothing but "premature optimization!"
I've seen the same deflection shield used to defend abhorrent database designs, convoluted, overly-abstracted class designs or message patterns, and virtually anything else where a realist might proactively ponder "but won't performance be a problem doing it like this?", only to yield the response "You know, premature optimization is a classic beginners mistake!"
If you don't want to be lumped in with beginners, the lesson goes, it's best to pretend that performance simply doesn't matter. We'll cross that bridge when we get to it.
Premature optimization is the root of all evil (or at least most of it) in programming.
Donald Knuth
I remember the early days: I once spent about 16 work hours optimizing a date munging function, increasing its performance from something like 2 million iterations per second to 4 million iterations. In the grand scheme of things, the performance difference was completely negligible, but from the perspective of artificial benchmarks it seemed like tremendous progress was being made.
That was premature optimization.
Indeed, anyone who's done time in the software development industry can identify with what Mr. Knuth was saying, probably having been involved with (or responsible for) project plans gone awry when efforts focused on highly-complex caching infrastructures, or ultra-optimizing some seldom used edge function.
Yet what is arguable, and situation specific, is deciding what qualifies as premature, versus what is simply proactive, predictive, professional performance prognostications.
NOT ALL PERFORMANCE CONSIDERATIONS ARE PREMATURE OPTIMIZATION!
While there is no doubt that there is such a thing as premature optimization -- it is an evil distraction that sidetracks many projects -- there are critical decisions made early in a project that can cripple the performance potential (both resource efficiency, and resource maximum), making later optimizations enormously expensive, if not impossible without an entire rewrite.
Whether it's heavily normalizing the database (or its nefarious doppelgänger, the classic database-within-the-database: "This single table can handle anything! Just put a comma separated array of serialized objects in each of the 256 varbinary(max) columns! Look at the flexibility! Query it? Don't you bother me with your premature optimizations!"), creating an application design that's incongruent with caching, or choosing an inefficient platform.
There are credible performance considerations that need to be addressed at the outset, and revisited as development proceeds. It is absolute insanity, and entirely irresponsible professionally, to simply stick one's head in the sand and hope that some magical virtual machine improvements or subcolumn indexing decomposition and querying technology will occur before deployment, or before the economics of scaling come into play.
And speaking of scaling, the canard that the horizontal-scalabilty intrinsic with most web apps (unless you really screwed up the design -- as many people do -- and made horizontal scalability impossible) makes the problem a nonissue is absurd: Perhaps if your project has a high transaction value then you have the luxury of adding more servers to serve a small number of clients, yet for most real-world projects adding resources is a big, big deal. And it isn't simply the cost of a low-end Dell 1850: Whether you're colocating or hosting in an expensively rigged corporate server room, the cost of each server is substantial.
You end up in the dilemma that you're financially (or physically) limited to a set quantity of resources, having to limit or scale-back the functionality provided to each user due to the inefficiencies caused by early decisions. "Sorry we can't implement that cool AJAX type-ahead lookups because the callbacks would kill our servers - we're already saturating them with our stack of inefficiency, so there's no overhead left."
I think the lackadaisical attitude towards efficiency is a result of experience derived from countless unvisited or seldom used web apps deployed across millions of PCs, colocated with equally as spartanly used peers. When a site sees a dozen visitors in a day, it's easy to declare that performance is a seeming nonissue nowadays - that it's only a concern for game programmers and nuclear modelling engineers. Then one day the page gets mentioned on Digg or Reddit or Slashdot or BobOnHardware and in that potential moment of glory the app falls over and dies, again and again.
None of this really has anything to do with Ruby. Personally I haven't used it beyond the tutorials, though I do know that it does very, very poorly on the standardized benchmarks. However it is distressing seeing so many people dismiss Joel's comments (or comments about Python, or ERlang, or XML, or any other technology) as premature optimization.
I've never been a fan of The Daily WTF.
While I can appreciate that it might elicit an occasional chuckle, and may even serve as a "here be dragons!" warning of the dangers of bad code or UIs, something about it just rubs me the wrong way.
The root of my dislike, I think, is my feeling that there's a bit too much of a schadenfreude thing going on much of the time. While I'm sure that the site is run and visited by a lot of great people, many of the regulars seem a little too eager to bask in the foibles of others, presumably imagining themselves righteous fonts of perfection and clarity.
Even the most benign and negligible choice of bitmasks is apparently worthy of mockery.
That would be fine and good, and it would just be a site I don't visit, however a bit of a meme lately (given that there's a current SuperStar! Developer! thing going on) is something along the lines of "Do you want a great programmer, or someone whose code appears on TheDailyWTF?" This is a repeating theme: On the one side are the great programmers, and on the other are the people endlessly bound to give TheDailyWTF source material.
Do people really think such a schism exists? Is the impression that great developers are infallible, never creating any bad code at all, ever? Are bad programmers just stumbling from one WTF to another?
Of course not.
I fear the output of any developer who claimed that they've never written bad code. I would fear them because they're either bald-faced liars -- believing that simply saying it repeatedly will somehow convince others into this fiction -- or they're completely blind to their own weaknesses.
Every developer in the real world has had bad days, brain faults, or bad interpretations of new languages, environments or libraries. It's simply a given of the profession.
Building a myth of perfection fools no one.
I'm a big fan of fall fairs.

The greasy food. The animal displays. The cute baby contests. The best-of-category vegetable competitions. The tractor pulls. How couldn't you love a good agricultural society fall fair?

And it isn't like I wallow in, imagining myself a know-it-all seen-it-all urbanite, taking a chance to get a chuckle at the rural folk. In actuality I often marvel at how ideal and livable many of these small towns are.
Lucky for me and the family then that Ontario has a rich calendar of fall fairs occurring over the coming weeks.
Which brings me to websites - we entirely choose which fall fairs to attend each season based upon their websites (linked from entries in that directory). It gives us comfort that we're not going to drive an hour to find a shack with a couple of pigs running free, and of course with dozens of fairs occurring (and a high degree of mobility allowing us a wide range of possibilities) we want to pick the most interesting one each weekend. While I doubt that all that many of their visitors are provincial tourists browing the OAAS website, it does seem to be a good idea to spend just a little time on the website.
Which brings me to this fair. Contained on a largely useless website (no offense intended to the author -- I doubt it was their highest priority) is the notice that they couldn't post the prize list because of lack of resources. They follow it up with a petition for volunteers.
Really, how tiny must a community be that there isn't anyone will to scan a print out, or better yet type it in? Doesn't this town have a single eager-to-look like a hacker 11 year olds (or 40 year olds for that matter. Or 70 year olds)? How is it even possible that they can't find resources for this?
In any case, it's clear that the fall fair industry is desperately in need of a good content management system. I'm surprized that the OAAS doesn't offer them a half decent shell to advertise their event ("Please check off each event your fair contains - baby beauty contest [ ] apple bobbing [ ]"...).
While Benford's law (a.k.a. the first-digit law) is old hat for those in the mathematics posse, and has long been demystified, it's seeing increasingly frequent references in the online world: From blog subscriber counts, to advice for tax cheats ("make sure to distribute those numbers appropriately!"), to claims that it's a magic technique for detecting real or fake sequences of dice rolls (dubious) -- it's being portrayed as an infalliable method of numerical omniscience, applicable anywhere that sets of numbers can be found.
There's a lot of truth out there, but there's also a lot of mistruth. So after seeing yet another incorrect application of Benford's law (where again it was presumed to magically apply to all number sets), I thought it worth throwing a quick entry together, adding in a little scripting goodness to demonstrate the point (the scripted section may not work in some aggregators and readers). This doesn't really relate to the normal subject matter of this blog, but hopefully it's interesting to people regardless.
I should add the warning that I am not a mathematician, and my interest in this subject came only as a passing interest several years back. It was then that I caught a television program featuring a pundit describing a technique he was advocating to catch fraudulent tax returns. By analyzing the distribution of leading digits on tax return rows, he claimed, they could accurately predict where numbers were artifically generated, and conversely where they were real.
The argument that he was proposing, and the cursory information I then found about this law, struck me as remarkably unintuitive (at the time, though now it seems embarrassingly obvious), so I spent a little time thinking about how this sort of numeric distribution comes about.

What I learned then was that the "law" predicts that approximately 1/3 of numbers in certain sets of data -- in particular those with a logarithmic distribution (this will be discussed later) -- begin with the number 1, with decreasing frequency for each remaining digit (e.g. numbers beginning with a "9" occur in only 4.6% of numeric sets conforming with the law. Of course this is all in regards to base-10 numbers).
Purportedly the first known inklings of the law were described when Simon Newcomb, a Nova Scotian astronomer, noticed that certain pages of a logarithm book had far more wear than other pages, indicating that certain values appeared with more prevalence.
The reason for the unevent lookup wear became evident on further analysis: If one were to accumulate a vast reservoir of data on the populations of cities, the prices of menu items, and so on, the eerie presence of Benford's law would become evident, seemingly against common wisdom. Where one would expect numbers to cover the spectrum, instead the leading digit distribution predictions held true.
The following is a demonstration of Benford's Law materialized, with zero magic or alien intervention. Simply choose the settings (the defaults should be fine) and then click on "Initialize Random Set". This will give you a set of randomly distributed numbers between 0 and the max random number chosen. The table will display the prevalence of leading digits.
Thus far the numbers should be randomly distributed, risking a ticket from a Benford's law enforcement officer. Of course random or linearly distributed numbers aren't expected to conform to Benford's law, so that's entirely expected.
Now click on the "Inflation / Deflation!" button, which will randomly scale each value in the set to anywhere from 25% to 225% of its original value on each press.
Almost immediately the distribution will start to mirror Benford's Law. At most you might require two or three iterations until it accurately comforms.
Try it with a random starting max of 5 (thereby making the initial set only possibly contain the starting digits from 1-5) and then start scaling. Does Benford's Law appear?
| Leading Digit | Count | Proportion |
|---|---|---|
| 1 |
|
|
| 2 |
|
|
| 3 |
|
|
| 4 |
|
|
| 5 |
|
|
| 6 |
|
|
| 7 |
|
|
| 8 |
|
|
| 9 |
|
The explanation is simple and obvious once described: To go from 1 to 2, a number has to appreciate by 100%, whereas to go from 2 to 3 it would only have to appreciate by 50%. To go from 3 to 4 requires only a 33% increase.
This might seem irrelevant, as from a purely additive sense each increase is the same +1 linear increase, however in a logarithmic distribution (e.g. funding that increases or decreases 15% a year), increases and decreases are proportionate with the underlying value.
The same friction-of-appreciation also holds true going from 10 to 20, or 100 to 200, or 100000 to 200000, each representing a much more significant proportionate increase than the following 20 to 30 or 200 to 300 or 200000 to 300000.
For this particular sample, this materializes as random proportionate deltas have a higher probability of "skipping" the higher leading digits, while sticking to the lower leading digits. If an existing value is 50, for instance, and it's going to randomly increase anywhere from 0 to 200%, that yields a 50% probability that the resulting value will have a leading digit of 1.
Think of the population increase or decrease of a city -- it generally scales with the city. A large city might grow or shrink by 50,000 people year over year, albeit representing only a small percentage of the total population, while a small city might increase by 500 people. Yet as a percentage of population change they might be the same.
Similarly, an item at $10.00 will have to see a lot of inflation until it costs $20, but then it's a short ride to $30, and an even shorter hop to $40 -- proportionately speaking, of course.
And units of measure don't actually matter. After Benford's Law has appeared in the set, click on the Multiply Set button - this will multiply every set member by 3.75X (a completely arbitrary value)...yet the pattern remains.
Hopefully this has delivered a bit of food for thought about the applicability (or inapplicability) of Benford's law. It generally only fits larger sets of logarithmically distributed values, although that happens to be what many of the values in society, and in nature, are.
Some time back I wrote a brief entry regarding the adoption of products. In it, I made the blaringly obvious observation that many products that seem to be revolutionary, and that have taken the market by storm, really just made existing products or technologies slightly easier to use, or slightly more useful (as amazing and technologically remarkable an iPod is, for most users it's functionally equivalent to a 1980 Walkman).
It's a better mousetrap model that has driven business, and consumerism, for decades.
In reverse, making something a little less convenient, and a little less accessible, works effectively at avoiding undesired behaviour in a target audience.
Anti-piracy efforts, for instance, have never been pursued with expectations of absolute success, and it really hasn't broken their model when someone sits in IRC #warez channels all day, and then puts their PC at risk of spyware/trojans/viruses with cracks and serial gens. It's the other 99% of the population that's the target of low-barrier anti-piracy technologies. Those are the people who would rather just pay $49.95 at the computer store than waste the time or take the risks.
Authorization, serial numbers, machine keying -- all of these are intended to make it just a little more of a hassle to use unauthorized copies, decreasing the casual piracy of normal people. Of course sometimes it backfires, and the anti-piracy techniques are more of a pain than the alternative, but that's another story.
Manipulating "ease of use" can work for self-control as well. A common bit of wisdom for those looking to pursue healthy eating is "avoid it once at the store, or avoid it countless times at home" -- If you can stop yourself from buying a bag of cookies or box of ding dongs at the grocery store, the adage goes, that one exercise of self-control will save from having to use restraint countless times as said treats sit on your shelf, begging to be consumed. Sure, you could just hop in your car and go buy a box of ding dongs when the munchies hit, but for many people the desire is low enough that it isn't worth the trouble, and you either go without or choose something healthier.
This sort of "front-end self control" came to mind today as I analyzed the things that work, and the things that don't, in my weekly online adventures: Being in software development of course means consuming the news and information in the industry, and conversing (and hopefully debating) with informed, interesting people who have an enlightened point of view. The hope is to consume valuable, worthwhile information, and to engage in conversations that leave me feeling a little more knowledgeable.
On my web adventures, the things that work are those that move me towards goals, help my understanding of industry technologies and trends, or even just entertain me (all work is a recipe for trouble, and a funny YouTube video or The Onion article every now and then is very beneficial for productivity).
The things that don't are basically everything else, which is a set usually comprised of sites that I visit almost reflexively: Sites that once had merit for me personally, but no longer do (perhaps "we've grown apart", and they're at a technology level or scope that I'm not really interested in at this point, or perhaps their content has gone from quality to garbage), but I still find them sitting in my bookmarks listing, usually with shortcut keys.
I habitually find myself typing their URL without even really thinking about it. I'm human, and thus a creature of habit. Once there I'm invariably sucked into unfulfilling content, or annoying, unfulfilling debates.
Yet, while these sites have limited utility for me now, their "ease of use" is extraordinarily high simply as a function of acclimation and habit.
So, much like avoiding the bag of cookies at the grocery store, I've enacted some simple controls to make it just a little more of a hassle to visit them.
Of course there are many ways that I can circumvent this, most directly by just turning off the self-imposed "restrictions", but that's missing the point - that's like hopping in my car and driving to the grocery store because I feel like a cookie. It isn't going to happen simply because the functionality provided is far too low to offset the nuisance of getting there.