Dennis Forbes on Pragmatic Software Development
Subscribe to RSS
 
Friday, September 08 2006

I've never been a fan of The Daily WTF

While I can appreciate that it might elicit an occasional chuckle, and may even serve as a "here be dragons!" warning of the dangers of bad code or UIs, something about it just rubs me the wrong way.

The root of my dislike, I think, is my feeling that there's a bit too much of a schadenfreude thing going on much of the time. While I'm sure that the site is run and visited by a lot of great people, many of the regulars seem a little too eager to bask in the foibles of others, presumably imagining themselves righteous fonts of perfection and clarity.

Even the most benign and negligible choice of bitmasks is apparently worthy of mockery.

That would be fine and good, and it would just be a site I don't visit, however a bit of a meme lately (given that there's a current SuperStar! Developer! thing going on) is something along the lines of "Do you want a great programmer, or someone whose code appears on TheDailyWTF?" This is a repeating theme: On the one side are the great programmers, and on the other are the people endlessly bound to give TheDailyWTF source material.

Do people really think such a schism exists? Is the impression that great developers are infallible, never creating any bad code at all, ever? Are bad programmers just stumbling from one WTF to another?

Of course not.

I fear the output of any developer who claimed that they've never written bad code. I would fear them because they're either bald-faced liars -- believing that simply saying it repeatedly will somehow convince others into this fiction -- or they're completely blind to their own weaknesses.

Every developer in the real world has had bad days, brain faults, or bad interpretations of new languages, environments or libraries. It's simply a given of the profession.

Building a myth of perfection fools no one.

Friday, September 08 2006

I'm a big fan of fall fairs.

Rockton Fair Entrance

The greasy food. The animal displays. The cute baby contests. The best-of-category vegetable competitions. The tractor pulls. How couldn't you love a good agricultural society fall fair?

Vegetable Competition

And it isn't like I wallow in, imagining myself a know-it-all seen-it-all urbanite, taking a chance to get a chuckle at the rural folk. In actuality I often marvel at how ideal and livable many of these small towns are.

Lucky for me and the family then that Ontario has a rich calendar of fall fairs occurring over the coming weeks.

Which brings me to websites - we entirely choose which fall fairs to attend each season based upon their websites (linked from entries in that directory). It gives us comfort that we're not going to drive an hour to find a shack with a couple of pigs running free, and of course with dozens of fairs occurring (and a high degree of mobility allowing us a wide range of possibilities) we want to pick the most interesting one each weekend. While I doubt that all that many of their visitors are provincial tourists browing the OAAS website, it does seem to be a good idea to spend just a little time on the website.

Which brings me to this fair. Contained on a largely useless website (no offense intended to the author -- I doubt it was their highest priority) is the notice that they couldn't post the prize list because of lack of resources. They follow it up with a petition for volunteers.

Really, how tiny must a community be that there isn't anyone will to scan a print out, or better yet type it in? Doesn't this town have a single eager-to-look like a hacker 11 year olds (or 40 year olds for that matter. Or 70 year olds)? How is it even possible that they can't find resources for this?

In any case, it's clear that the fall fair industry is desperately in need of a good content management system. I'm surprized that the OAAS doesn't offer them a half decent shell to advertise their event ("Please check off each event your fair contains - baby beauty contest [ ] apple bobbing [ ]"...).

  Personal 
Tuesday, September 05 2006

While Benford's law (a.k.a. the first-digit law) is old hat for those in the mathematics posse, and has long been demystified, it's seeing increasingly frequent references in the online world: From blog subscriber counts, to advice for tax cheats ("make sure to distribute those numbers appropriately!"), to claims that it's a magic technique for detecting real or fake sequences of dice rolls (dubious) -- it's being portrayed as an infalliable method of numerical omniscience, applicable anywhere that sets of numbers can be found.

There's a lot of truth out there, but there's also a lot of mistruth. So after seeing yet another incorrect application of Benford's law (where again it was presumed to magically apply to all number sets), I thought it worth throwing a quick entry together, adding in a little scripting goodness to demonstrate the point (the scripted section may not work in some aggregators and readers). This doesn't really relate to the normal subject matter of this blog, but hopefully it's interesting to people regardless.

I should add the warning that I am not a mathematician, and my interest in this subject came only as a passing interest several years back. It was then that I caught a television program featuring a pundit describing a technique he was advocating to catch fraudulent tax returns. By analyzing the distribution of leading digits on tax return rows, he claimed, they could accurately predict where numbers were artifically generated, and conversely where they were real.

The argument that he was proposing, and the cursory information I then found about this law, struck me as remarkably unintuitive (at the time, though now it seems embarrassingly obvious), so I spent a little time thinking about how this sort of numeric distribution comes about.

Leading Digit Chart

What I learned then was that the "law" predicts that approximately 1/3 of numbers in certain sets of data -- in particular those with a logarithmic distribution (this will be discussed later) -- begin with the number 1, with decreasing frequency for each remaining digit (e.g. numbers beginning with a "9" occur in only 4.6% of numeric sets conforming with the law. Of course this is all in regards to base-10 numbers).

Purportedly the first known inklings of the law were described when Simon Newcomb, a Nova Scotian astronomer, noticed that certain pages of a logarithm book had far more wear than other pages, indicating that certain values appeared with more prevalence.

The reason for the unevent lookup wear became evident on further analysis: If one were to accumulate a vast reservoir of data on the populations of cities, the prices of menu items, and so on, the eerie presence of Benford's law would become evident, seemingly against common wisdom. Where one would expect numbers to cover the spectrum, instead the leading digit distribution predictions held true.

The following is a demonstration of Benford's Law materialized, with zero magic or alien intervention. Simply choose the settings (the defaults should be fine) and then click on "Initialize Random Set". This will give you a set of randomly distributed numbers between 0 and the max random number chosen. The table will display the prevalence of leading digits.

Thus far the numbers should be randomly distributed, risking a ticket from a Benford's law enforcement officer. Of course random or linearly distributed numbers aren't expected to conform to Benford's law, so that's entirely expected.

Now click on the "Inflation / Deflation!" button, which will randomly scale each value in the set to anywhere from 25% to 225% of its original value on each press.

Almost immediately the distribution will start to mirror Benford's Law. At most you might require two or three iterations until it accurately comforms.

Try it with a random starting max of 5 (thereby making the initial set only possibly contain the starting digits from 1-5) and then start scaling. Does Benford's Law appear?

Benford's Law Demonstration

Number of Random Values:     Random Max:

  

Leading Digit Count Proportion
1  
 
2  
 
3  
 
4  
 
5  
 
6  
 
7  
 
8  
 
9  
 

The explanation is simple and obvious once described: To go from 1 to 2, a number has to appreciate by 100%, whereas to go from 2 to 3 it would only have to appreciate by 50%. To go from 3 to 4 requires only a 33% increase.

This might seem irrelevant, as from a purely additive sense each increase is the same +1 linear increase, however in a logarithmic distribution (e.g. funding that increases or decreases 15% a year), increases and decreases are proportionate with the underlying value.

The same friction-of-appreciation also holds true going from 10 to 20, or 100 to 200, or 100000 to 200000, each representing a much more significant proportionate increase than the following 20 to 30 or 200 to 300 or 200000 to 300000.

For this particular sample, this materializes as random proportionate deltas have a higher probability of "skipping" the higher leading digits, while sticking to the lower leading digits. If an existing value is 50, for instance,  and it's going to randomly increase anywhere from 0 to 200%, that yields a 50% probability that the resulting value will have a leading digit of 1.

Think of the population increase or decrease of a city -- it generally scales with the city. A large city might grow or shrink by 50,000 people year over year, albeit representing only a small percentage of the total population, while a small city might increase by 500 people. Yet as a percentage of population change they might be the same.

Similarly, an item at $10.00 will have to see a lot of inflation until it costs $20, but then it's a short ride to $30, and an even shorter hop to $40 -- proportionately speaking, of course.

And units of measure don't actually matter. After Benford's Law has appeared in the set, click on the Multiply Set button - this will multiply every set member by 3.75X (a completely arbitrary value)...yet the pattern remains.

Hopefully this has delivered a bit of food for thought about the applicability (or inapplicability) of Benford's law. It generally only fits larger sets of logarithmically distributed values, although that happens to be what many of the values in society, and in nature, are.

  Personal 
Monday, September 04 2006

Some time back I wrote a brief entry regarding the adoption of products. In it, I made the blaringly obvious observation that many products that seem to be revolutionary, and that have taken the market by storm, really just made existing products or technologies slightly easier to use, or slightly more useful (as amazing and technologically remarkable an iPod is, for most users it's functionally equivalent to a 1980 Walkman).

It's a better mousetrap model that has driven business, and consumerism, for decades.

In reverse, making something a little less convenient, and a little less accessible, works effectively at avoiding undesired behaviour in a target audience.

IMG_7123

Anti-piracy efforts, for instance, have never been pursued with expectations of absolute success, and it really hasn't broken their model when someone sits in IRC #warez channels all day, and then puts their PC at risk of spyware/trojans/viruses with cracks and serial gens. It's the other 99% of the population that's the target of low-barrier anti-piracy technologies. Those are the people who would rather just pay $49.95 at the computer store than waste the time or take the risks.

Authorization, serial numbers, machine keying -- all of these are intended to make it just a little more of a hassle to use unauthorized copies, decreasing the casual piracy of normal people. Of course sometimes it backfires, and the anti-piracy techniques are more of a pain than the alternative, but that's another story.

Manipulating "ease of use" can work for self-control as well. A common bit of wisdom for those looking to pursue healthy eating is "avoid it once at the store, or avoid it countless times at home" -- If you can stop yourself from buying a bag of cookies or box of ding dongs at the grocery store, the adage goes, that one exercise of self-control will save from having to use restraint countless times as said treats sit on your shelf, begging to be consumed. Sure, you could just hop in your car and go buy a box of ding dongs when the munchies hit, but for many people the desire is low enough that it isn't worth the trouble, and you either go without or choose something healthier.

This sort of "front-end self control" came to mind today as I analyzed the things that work, and the things that don't, in my weekly online adventures: Being in software development of course means consuming the news and information in the industry, and conversing (and hopefully debating) with informed, interesting people who have an enlightened point of view. The hope is to consume valuable, worthwhile information, and to engage in conversations that leave me feeling a little more knowledgeable.

IMG_7157

On my web adventures, the things that work are those that move me towards goals, help my understanding of industry technologies and trends, or even just entertain me (all work is a recipe for trouble, and a funny YouTube video or The Onion article every now and then is very beneficial for productivity).

The things that don't are basically everything else, which is a set usually comprised of sites that I visit almost reflexively: Sites that once had merit for me personally, but no longer do (perhaps "we've grown apart", and they're at a technology level or scope that I'm not really interested in at this point, or perhaps their content has gone from quality to garbage), but I still find them sitting in my bookmarks listing, usually with shortcut keys.

I habitually find myself typing their URL without even really thinking about it. I'm human, and thus a creature of habit. Once there I'm invariably sucked into unfulfilling content, or annoying, unfulfilling debates.

Yet, while these sites have limited utility for me now, their "ease of use" is extraordinarily high simply as a function of acclimation and habit.

So, much like avoiding the bag of cookies at the grocery store, I've enacted some simple controls to make it just a little more of a hassle to visit them.

  • I've "banned" them in Adblock. This basically means that visiting them in Firefox -- my browser of choice -- requires reconfiguring the Adblock rules, marginally decreasing the ease of use. Sure, I could just jump over to another browser, but losing the enormous functionality and benefits of Firefox with my select set of Extensions dramatically lowers the "ease of use" factor, eliminating any marginal interest in visiting these sites.
  • I've "banned" them in my router. Most routers nowadays include L5-L7 filtering, allowing it to inspect and block requests for certain domains/URLs. I go in this section so infrequently, and the interface is so terrible, that it smashes ease of use to the ground, despite acclimitization and habit.

Of course there are many ways that I can circumvent this, most directly by just turning off the self-imposed "restrictions", but that's missing the point - that's like hopping in my car and driving to the grocery store because I feel like a cookie. It isn't going to happen simply because the functionality provided is far too low to offset the nuisance of getting there.

 

Thursday, August 31 2006

I was leafing through yesterday's dead-tree print edition of the National Post when an article afront the "FP Working" section caught my eye: It looked to be a piece on immigration, hinted at by the accompanying graphic of a shipping box containing several goofy-looking "techie" caricatures. The box was drawn with the label "URGENT, SHIP OVERNIGHT TO CANADA", and was emblazoned with the flag of India.

Given that Canada has the highest per-capita immigration rate in the world, articles on the topic are often thought-provoking and informative, and usually serve as worthwhile fodder for debate, so I gave it a closer look.

Disappointingly, it turned out instead to be a hilarious piece about offshoring[*1], written with such wide-eyed naivety that I had to check the date on the paper to see if I accidentally pulled out one from half a decade ago, when this sort of "win/win!" nonsense was even remotely believable.

Without tearing apart the numerous fundamental errors in this terrible article[*2] (oh, but everyone is doing it, the article claims, so surely it must be true?), I'd rather simply point you to Paul Graham's excellent essay on covert public relations via submarine PR.

If the author (and/or the paper) didn't get a handsome kickback, they were robbed. I was surprized (and dismayed) that there weren't "advertorial!" disclaimers atop it.

*1 - And I'm anything but an isolationist, and entirely believe in globalization. Apart from minor transition hiccups, the end result is the enrichment of everyone.

*2 - In the case of offshoring, the early "cheapness" quickly faded as knowledge workers in India and China decided that their lot in life wasn't just to be producers for the West, but rather that they too wanted a chance to "consume". Wages have been increasing at a torrential pace, and getting anyone anywhere above terribly incompetent has been described as close to impossible (while there are huge numbers of great talents, there are also huge numbers of potential employers. Bob's Oil Cleaning is going to get the absolute dregs of the dregs of the dregs working on their app - and that's ignoring the ridiculous contention that having the app developed during the twilight hours is somehow advantageous for an industry that hasn't even come to grips with telecommuting - leading to quality problems that are becoming a serious concern in the industry)

Saturday, August 19 2006

An oft referenced problem in the Windows world is .Dll Hell (*). It occurs when many applications depend upon the code in a shared .dll (a dynamic link library, which is basically code that is linked at runtime rather than compile time), an often ideal scenario given that you can upgrade security faults in one single location rather than recompiling and distributing static linked library using applications, or searching for disparate private copies scattered across volumes. Problems start to happen, however, if the dll is changed in a way that breaks some of the dependent consumers (for instance one of the applications rolls out a new version that changed the external API), causing inconsistencies or outright failures in other applications.

[* - Sidenote: The Wikipedia article linked from DLL Hell claims that the term was "introduced to the general public by Rick Anderson" in 2000. This is, of course, complete and utter nonsense -- it was a very common piece of terminology many years earlier, and an MSDN article hardly introduces it to the "general public". I come across these sorts of historical revisionisms on Wikipedia far too many times. Is it a Wikiality? I suppose I "introduced SVG to the general public" when I wrote a "paper" for MSDN Magazine, so I should go claim my crown...]

While the problem already existed for classic-code dlls that were stored in a shared location (usually for space-saving reasons), it really became a problem with OCX/COM, where the activation architecture basically demanded that you use the shared copy.

In spirit, similar problems occur even with high-level platforms such as Apache, or even just modules like PHP, where a version change can break a lot of applications that run atop it or depend upon it, causing significant heartache, and making deployment issues much more complex (particularly when you have multiple dependent applications, some of them more adaptable than others) .

There have been many declarations of an "end to DLL hell!", with Microsoft pushing various approaches and strategies, to varying success.

With .NET, the solution is generally "share nothing", to the point that even the various versions of the .NET runtime exist as islands, with a .NET 2.0 application having all of its libraries local (often version linking, so if the same library exists in many applications, but the versions differ slightly, it will be loaded separately and mapped individually), even if they're components used by dozens of applications, using the .NET 2.0 framework island and runtime, while a .NET 1.1 application exists in its own little world, and the same for a .NET 1.0 app. There still exists a classic "shared activation" model via the global assembly cache (GAC); however it's a little used bit of infrastructure.

Storage space is incredibly cheap, and memory space is becoming a non-issue, so this sort of approach has a lot of merit.

Why not take it a level higher? With massively powerful servers, seemingly endless memory, and free virtual server products (from both VMWare and Microsoft), we're entering an era when it is entirely possible, and often ideal, to release your product as a complete virtual server.

Of course, I'm repeating myself now, but this idea really appeals to me.

Some time back, for instance, I was considering making a commercial, corporate web application timesheet tracking system (I've made some of these before. One particular one - an AJAXish DHTML solution I made back in the late 90s - I still think beats out most of what I see today), however a hosted model wouldn't fly with most customers given the amount of information that could be garnered from their timesheets: Many customers would want to host it themselves. Yet then you face the dilemma of releasing a product that can exist within their current architecture and skillset, a particularly onerous task given the many dependencies of a modern web application.

Inevitably you'd be putting yourself out of contention for a lot of customers because you used X instead of Y, and would be endlessly fielding support issues when their platform changed faster (or slower) than your application did.

So why not release your web application (or any type of application) on an "appliance" virtual machine, as it's now getting named? The same goes for application "consumption": If you're a Windows shop, instead of hosting your wiki on Windows, or far worse limiting your choices to the small selection of options that exist for your particular ecosystem of dependencies, perhaps you could just deploy a Wiki appliance with the perfectly ideal configuration of database server, web server, host operating system, and modules.

Configure your appliance to only allow port 80 traffic in (or better yet work on an appliance platform where the accessible ports on each virtual machine can be configured, perhaps by a separate "firewall" virtual machine), and live in an application model, with whatever version of MySQL, Postgresql, or Apache you want, custom configured in a way that perfectly matches your requirements.

Virtual machines have so many advantages, not the least of which is the ability to move them between hardware with minimal hassle. Indeed, I had exactly this scenario recently, where the Team Foundation Server application tier was running on a box that was getting a little overloaded...well it was just a virtual machine, so it was nothing more than pausing the state, moving it to another virtual server hosting box, and starting it up. This balanced the load better, and was completely transparent to the users.

There are downsides that would have to be taken into account - some shops might want a better backup solution than pausing the virtual machine and archiving the entire virtual hard drive (which is, I should mention, a wonderful capability -- the entire "machine" in one single, relatively small file, atomically copyable and restorable. In development I've used this endless to save various platform configurations, restoring to exactly the one that is pertinent for a particular need), however there are endless possible, application specific solutions to this sort of problem.

There's also the issue that Microsoft doesn't take kindly to releasing virtual machines based on their software, so perhaps this is a model that works best when the software you're depending upon is freely distributable (within the confines of the license).

Tuesday, August 01 2006

Stephen Colbert did a humorous segment on truthiness last night, this time on the topic of historical revisionism. You can view the clip at http://www.youtube.com/watch?v=zmHm0rGns4I.

What made this segment particularly famous (or infamous) were Stephen's (Mr. Colbert's?) comments regarding the illustrious Wikipedia: After indicating that he was revising some entries to alter history, pretending to do it on a laptop during taping (a supposed Stephencolbert user pretty much simultaneously -- to the airing, not the taping -- made a couple of edits correlating with the show. These edits were on the topic of George Washington and  The Colbert Report recurring elements, exactly as indicated on the show. This user could just as easily have been a third party following along, but the effect is the same, and is just as humorous), he then coined the new word Wikiality.

What really raised the ire of the Wikipedia defenders, however, was Mr. Colbert's humorous petition for users to support him in his quest for historical revisionism, altering the Wikipedia entry for elephants to support a fictional 3x increase in the total population over the past 6 months ("Explain that Al Gore!"). Many played along, until eventually the page in question (and virtually all other pages related to elephants) was locked to avoid this jovial vandalism.

Personally I think Stephen made a brilliant point, even if there was a bit of collateral damage. Some of the reaction to it has simply been ridiculous.

  • This isn't really an indictment of Wikipedia (and in fact further solidifies its place in mainstream culture).

  • It isn't a "validation of Wikipedia" that these vandalisms were caught and reverted, or that protections were put in place (basically undermining the core principals of Wikipedia, albeit temporarily). 

    The people and groups truly dedicated to revisionism generally don't advertise their actions nationwide (which brings up the scary fact that a remarkable number of people viewing the Colbert Report don't realize that it's satire).

    If a completely independent enthusiast of sugar-and-sugar-producers decides to alter sections of the sucralose entry to highlight (exaggerate?) possible health risks, would the Wikipedia editors (who themselves aren't immune from suspicion) know that it's misinformation? Would a sucralose defender be just as motivated to monitor the site to ensure that the representation remains fair and balanced, or would they be skewing things the opposite way? Is it misinformation if it's using the ambiguity of the English language to accentuate some points that support one's interests, while undermining those that counter it?

  • Such a user-contributed system will invariably have submarined biases depending upon who is the most motivated to contribute and to police. The perspective of widespread internet-enabled countries, for instance, will be magnified and dominant (have you ever notice how widespread the Canadian perspective is on the English internet? We have quite a few people with high-speed cluttering the English language sites. It's often cold with nothing to do, so we spend a lot of time online). The perspective of special interests and motivated minorities will be magnified.

    I don't mean to single this particular group out, but I recently heard about a group of Israel supporters that have taken to swarming sites and online polls to skew it towards the Israeli perspective. They've even gone and built a system tray application to make sure supporters are alerted when distributed vote stuffing is called for.

  • Stephen Colbert didn't suddenly make nefarious agents aware of something they previously had no idea about -- there have been professional groups, with offices and everything, working to subvert information to favour certain interests for decades, and they've certainly used their techniques on the net as well. Thinking that it was safe until Stephen Colbert mentioned it is as logical as saying the water supply is safe until someone postulated that it could be poisoned by terrorists.

Of course these, err, truths hold for more than just Wikipedia: Virtually any user-contributed site faces the same problems.

Reddit, for instance -- an up and coming meme site -- lets users "vote" which links and comments are most, well, in line with one's own view (while the links get rated on a what is occasionally a meritocracy, the voting on comments is usually extremely one sided, having very little to do with presenting a valid, well-spoken argument, and more to do with saying something that correlates with every fly-by voters opinion. It is actually embarrassing seeing your own comment scored up because you happen to share the majority view, while the well-written and convincing posts of your adversary sink into underflow territory).

Using an apparently basic votes-over-time algorithms, the app determines which links to put on the front page -- Getting on the front page is obviously a desirable place to be for someone try to push a perspective or an agenda (remember that the majority of users on most of these sites are lurkers - while many people are set in their position, and are valiant, tireless crusaders for the cause, there are a lot of people who are on the fence, absorbing whatever information on a topic is presented to them, willing to change their position based upon new inputs). My personal experience here has been that a Reddit front page isn't anything like a Digg front page in the volume of traffic it sends to you, but it still brings in a considerable number of users ready and waiting to be stuffed with one's perspective.

So how does one get on the front page? Well aside from pandering to the natural bias of the Reddit crowd (a crowd that leans towards libertarianism/anti-authority/extreme liberalism/Lispism, a demographic that is heavily reflected in the vote patterns), getting on the front page can be accomplished with little more than a dozen votes over a short period of time (one vote per IP, folks). Staying on the front page for a work day can be accomplished with just a couple hundred up votes.

Topping the all time record books in the Reddit universe takes less than 900 up votes after the negations have been subtracted out.

How hard is it for a special interest to manipulate a site like this? The Israeli support site up above is fairly open and inclusive in their advocacy, but surely all such groups aren't so forthright -- Microsoft has some 60,000 employees, and while they have a limited number of work IPs, there are certainly 10s of thousands of home IPs that can be used to push an agenda. The same goes for oil companies, and virtually any other large organization.

While I hardly think an organization like Microsoft is going to enlist employees in a concerted astroturfing drive, is it difficult to imagine that there are similar groups doing something similar right now? I've already pointed out the Israel support orchestration above (and for those who would argue that their votes are legitimate, the problem is that it's completely disproportionate. It's why user-initiated poll submissions and feedback comments are usually absurdly skewed and not-correlating with reality - the people voting often have a vested interest motivating their actions), and every bit of common sense says that they aren't alone.

Once this approach has been mastered at Reddit, move on the larger meme sites like Digg - a couple of thousand votes on Digg is all it takes. Hire some botnet authors if need be.

(Note: This isn't intended to be a "how to", but this issue has bothered me for a while. Before hearing about the Giyus site mentioned above, I had already considered making an opinion swarming coordination web app, allowing groups to administer and privately coordinate opinion bombing runs. My goal was to highlight a potential problem, rather than enabling this sort of activity)

  Personal 

Earlier EntriesLater Entries

Dennis Forbes - Dennis Forbes is a Toronto-based software architect and technology writer