Joel Spolsky, the well-known blogger and ISV owner, kicked up quite a storm recently with his piece entitled Language Wars [for those following the `debate', yes, I'm late to the party on this. I make it a general standard to avoid responding to blogs on here -- the whole blog thing is entirely too recursive -- but some recent reactions to his piece pushed me to post].
The article leads off with some pragmatic wisdom, advising enterprise-y, low-risk type shops to use well-known and well-proven technology stacks -- solid advice that's hard to argue with -- yet he then ends the piece with a comment about an in-house, next-generation, super-duper language being used to develop FogCreek's premiere product, FogBugz.
The discord was so great that most readers presumed that the Wasabi thing was a joke, or alternately that the rest of the article was the joke (which would have been an awesome revelation). Much confusion ensued, to the point that Joel had to put up a post clarifying that he was actually serious about the Wasabi thing
Aside from the seeming hypocrisy, what really instantiated some JoelCritic<T> instances (via the BlogCriticFactory) were Joel's comments about Ruby, where he seemingly indicated that it wasn't ready for prime time.
...but for Serious Business Stuff you really must recognize that there just isn't a lot of experience in the world building big mission critical web systems in Ruby on Rails, and I'm really not sure that you won't hit scaling problems, or problems interfacing with some old legacy thingamabob, or problems finding programmers who can understand the code, or whatnot...
...I for one am scared of Ruby because (1) it displays a stunning antipathy towards Unicode and (2) it's known to be slow, so if you become The Next MySpace, you'll be buying 5 times as many boxes as the .NET guy down the hall.
I'm sure Joel anticipated the backlash. Perhaps it was even the motivation behind the posting: The resulting torrent of discussion brought quite a few visitors to his blog, and earned him a lot of inbound links, both of which have definitely helped with his new business ventures. No publicity is bad publicity, they say, especially if it's timed to coincide with the launch of a new job board (as an aside, Ruby, Wikipedia, OSX, Python, Lisp, and ERLang are all terrible! People with the letters J or P in their names are jerks!).
Ruby is still new enough, and with a small enough community, that many of its users double as evangelists -- think of the Amiga computer, the BeOS operating system, or any other contextually-superior alternative embraced by a small enough group that many feel an ego-intersection with the technology, motivated to defend and advocate it when the opportunity arises. Linux once had such an attack-dog core of rabid enthusiasts, though as the user base has grown, and it has become more pedestrian, you really have to target a Linux-niche (such as a little used distro) if you're aiming to stir up a hornet's nest.
That entire lead-up was just some context for the actual topic of this entry: So-called premature optimization.
A common response to Joel's complaint that Ruby is slow or resource inefficient is the frequently incanted declaration that such complaints are nothing but "premature optimization!"
I've seen the same deflection shield used to defend abhorrent database designs, convoluted, overly-abstracted class designs or message patterns, and virtually anything else where a realist might proactively ponder "but won't performance be a problem doing it like this?", only to yield the response "You know, premature optimization is a classic beginners mistake!"
If you don't want to be lumped in with beginners, the lesson goes, it's best to pretend that performance simply doesn't matter. We'll cross that bridge when we get to it.
Premature optimization is the root of all evil (or at least most of it) in programming.
Donald Knuth
I remember the early days: I once spent about 16 work hours optimizing a date munging function, increasing its performance from something like 2 million iterations per second to 4 million iterations. In the grand scheme of things, the performance difference was completely negligible, but from the perspective of artificial benchmarks it seemed like tremendous progress was being made.
That was premature optimization.
Indeed, anyone who's done time in the software development industry can identify with what Mr. Knuth was saying, probably having been involved with (or responsible for) project plans gone awry when efforts focused on highly-complex caching infrastructures, or ultra-optimizing some seldom used edge function.
Yet what is arguable, and situation specific, is deciding what qualifies as premature, versus what is simply proactive, predictive, professional performance prognostications.
NOT ALL PERFORMANCE CONSIDERATIONS ARE PREMATURE OPTIMIZATION!
While there is no doubt that there is such a thing as premature optimization -- it is an evil distraction that sidetracks many projects -- there are critical decisions made early in a project that can cripple the performance potential (both resource efficiency, and resource maximum), making later optimizations enormously expensive, if not impossible without an entire rewrite.
Whether it's heavily normalizing the database (or its nefarious doppelgänger, the classic database-within-the-database: "This single table can handle anything! Just put a comma separated array of serialized objects in each of the 256 varbinary(max) columns! Look at the flexibility! Query it? Don't you bother me with your premature optimizations!"), creating an application design that's incongruent with caching, or choosing an inefficient platform.
There are credible performance considerations that need to be addressed at the outset, and revisited as development proceeds. It is absolute insanity, and entirely irresponsible professionally, to simply stick one's head in the sand and hope that some magical virtual machine improvements or subcolumn indexing decomposition and querying technology will occur before deployment, or before the economics of scaling come into play.
And speaking of scaling, the canard that the horizontal-scalabilty intrinsic with most web apps (unless you really screwed up the design -- as many people do -- and made horizontal scalability impossible) makes the problem a nonissue is absurd: Perhaps if your project has a high transaction value then you have the luxury of adding more servers to serve a small number of clients, yet for most real-world projects adding resources is a big, big deal. And it isn't simply the cost of a low-end Dell 1850: Whether you're colocating or hosting in an expensively rigged corporate server room, the cost of each server is substantial.
You end up in the dilemma that you're financially (or physically) limited to a set quantity of resources, having to limit or scale-back the functionality provided to each user due to the inefficiencies caused by early decisions. "Sorry we can't implement that cool AJAX type-ahead lookups because the callbacks would kill our servers - we're already saturating them with our stack of inefficiency, so there's no overhead left."
I think the lackadaisical attitude towards efficiency is a result of experience derived from countless unvisited or seldom used web apps deployed across millions of PCs, colocated with equally as spartanly used peers. When a site sees a dozen visitors in a day, it's easy to declare that performance is a seeming nonissue nowadays - that it's only a concern for game programmers and nuclear modelling engineers. Then one day the page gets mentioned on Digg or Reddit or Slashdot or BobOnHardware and in that potential moment of glory the app falls over and dies, again and again.
None of this really has anything to do with Ruby. Personally I haven't used it beyond the tutorials, though I do know that it does very, very poorly on the standardized benchmarks. However it is distressing seeing so many people dismiss Joel's comments (or comments about Python, or ERlang, or XML, or any other technology) as premature optimization.
Why is it that "90% done" (and
its partner in crime - the ubiquitous "almost done!") is
the progress report for virtually any project, over virtually all
of its life-cycle?
Why has 90% become the fictional number of choice? Why not the more conservative 80%, or the bolder 95%? Given that it usually has little correlation with reality, they're just as real.
Projects should be reported as 87% done. Even when there's the ominous "we'll solve that problem when we get to it" task maliciously eyeballing you from later in the project plan, or the "it doesn't work and we have no clue why?" runtme reality, still say 87% with confidence and pride.
The well-known Hanlon's razor states-
Never attribute to malice that which can be adequately explained by stupidity.
While it's a seemingly pessimistic perspective on the capacity of one's fellow human, it is an undeniable truth that we often mistake carelessness, thoughtlessness, or outright ignorance as malicious intent.
Yet it's a more serene existence -- to the benefit of one's lifespan -- to simply assume that the person who dangerously cut you off on the road, for instance, is just a moron deserving a bit of sympathy, rather than considering him or her a roadway foe challenging you to a deadly battle of wills.
From a software development perspective, however, I think an inverted variation would serve the industry well.
Never blame others until adequately considering the possibility of your own (negligence | carelessness | stupidity).
As a general rule, the denizens of the software development profession -- it certainly isn't limited to this profession, but given that it's the general focus of this blog it's the one I comment upon -- have a tremendous capacity for assuming the worst of others, far before considering the unsavoury prospect that maybe -- just maybe -- it's actually their own mistake or lack of knowledge that's the cause of the issues they face.
It is far too common to cast a wide net of blame, declaring that Microsoft's products are screwing up, the documentation is all wrong, the server is malfunctioning (maybe because of cosmic ray particles toggling memory bits), the installation tool is a dud, and one's coworkers are surely idiots insidiously and maliciously changing code just to make one's brilliant code poetry fail to achieve its momentous glory.
After such hand-waving, blame-weaving dramatics, in most cases the developer realizes that they skipped an obvious step in the instructions, or they forgot to get latest of the entire branch, or they were copying the wrong file or looking at the wrong folder or running the wrong executable, or they were using the class entirely wrong, or they completely misunderstood how the operating system security system works, or they set a global setting a week back that completely changed how the application functions, or they ignored the email and documentation and group meeting detailing system changes, and so on.
They quietly retreat -- don't expect a retraction -- until they repeat the same mistake the next time something doesn't go exactly as imagined.
I've met these people in the industry. I've worked with these people. I've been one of these people.
I think we can all relate to situations where we've railed against a company, a product or a person, only to have the embarrassing realization that we were simply doing something dumb.
And it's not even that doing something dumb is noteworthy: We're humans, and we're bound to make mistakes. The problem is that we often don't even give a moment of time to even the possibility that we could be at fault, instead just assuming the worst of others.
It's far more beneficial to both productivity and team morale to have a little bit of self-doubt in these situations: Assume the worst of yourself before assuming the worst in others.
After hashing out this entry, I wondered why it wasn't appearing on the public blog. After berating various products and services, I remembered that I recently outsourced my DNS (for the reasons described here, using the service recommended by a reader), and forgot to add an entry for the FTP server. Whoops.
As a completely offtopic aside, one of the reasons I switched DNS providers was to have support for a domain SPF record. While it does nothing to stop the tide of pump-and-dump investment scam spams, at least it allows those recipients utilizing the service to immediately dump-bin those that claim to come from yafla (I get about 100 bounces a day, and who knows how more actually get through), knowing that the from: address was forged.
The CYA Application Security Model is the practice of implementing so-called security obstructions primarily to absolve the vendor from blame if something goes awry during everyday operations. This model is usually sold under the pretense of improving user education, or encouraging safer application usage, but that's of minimal actual concern (in reality the opposite outcome -- more risky application usage -- is probable).
An example of the CYA ASM in action is one that pops up a seemingly endless stream of confirmation "Are you really sure you want to do that?" dialog boxes, warning the user against doing what should be completely normal, benign activities.
This pestering, progress-inhibiting assault of a million warnings and confirmations application behaviour is certain to cause the user to enable a "turn off all security" mode (for instance adding every site to "trusted sites" in Internet Explorer), paradoxically making the security situation infinitely worse, but for the vendor this often the desired outcome: At least then they can smirk and blame it on the userbase if what should be a harmless activity compromises their machine.
Didn't you heed the "The Internet could be harmful to security!" dialog box when you attempted to connect to the internet?
Software development is a difficult task to meter.
It's not for lack of trying.
For decades consultants have been evangelizing methods which, they claim, would allow an unskilled, casual observer to easily measure and compare productivity in a contextually agnostic way.
Their ultimate goal: To allow a drop-in manager, with only a superficial knowledge of the activities, skills, and complexities of a task or project, to easily compute metrics by which to dole out the frequency and intensity of whippings and rewards.
[Aside: Before anyone incorrectly presumes any of this is critical of software development managers as a group or individually, realize that it is nothing of a sort: I start with a brief analysis of the goal of such simplistic measures -- most organizations would like positions, including management, to be lower-skill and easier (cheaper) to fill, and such a simplification of the role is definitely in their interest, just as many dream of the panacea of no-skill, factory-type software development -- and then actually question the fact that developers themselves are often guilty of quoting these metrics. 9 times out of 10, developers have only themselves to blame for a lot of the problems with the profession. This is not yet another boring us-versus-them war cry pandering piece, like those that top the meme charts frequently]
| February ConsultaMark(SM) ProductoMatrix(TM) Results | ||
|---|---|---|
| Cog | Output | Proposed Action |
| Tom | 117.6 | 2% Raise At Year End |
| Amy | 111.2 | 1% Raise At Year End |
| Jacob | 92.7 | Forced Overtime |
| Serene | 85.5 | Replace LCD with the 14" VGA monitor from the server room |
| Nellis | 68.0 | Creative Dismissal |
The same methods -- if they worked as promised -- could be used to chart project progress ("We're 7868.2 ConsultaMarks towards the 11273.9 estimated for the entire project!").
Instead of relying upon the from-the-trenches observations of Randal the development group manager -- a grizzled vet of software development who manages with a hands-on style by becoming intricately aware of the domain challenges and unique contributions of each team member -- Lynn, the parachuted in middle manager, wants some simple numbers that can be sorted like her mutual fund returns, giving her some available sacrificial lambs when the next diversion-from-massive-executive-fumbles headcount reduction comes due.
Many proposed solutions have come and gone, with the most persistent being the infamous SLOC (Source Lines of Code)/LOC measure.
SLOC, if you haven't been afflicted with it, is an easily computed count of the number of lines of code in a given project/component/object (although first you have to agree on the definition of a "line of code", and this is a point of debate among SLOC champions). It's often used to count the number of lines of tested, complete code added by a particular contributor (easily accomplished with many source code repositories), allowing for the easy creation of nice little charts like the one above.
SLOC does have some quasi-legitimate uses: Given a common programming language and domain complexity, SLOC magnitude differences have a moderate correlation with general project size, and at the method level it is a rough indicator of gross complexity (see the article FxCop & Cyclomatic Complexity for a discussion of a loosely related metric, which is the number of intermediate language instructions generated from a method).
Applied at the individual or group level, usually as a cheap substitute for good management and project awareness, SLOC measurements are likely to encourage very destructive behaviors: Copy/paste coding, limited reuse of existing code found elsewhere in the organization and the industry, little motivation to prune code where necessary, overly convoluted coding, motivation for employees to only take on trivial coding tasks, and so on.
Envision a system that ranked cooks by the number of lemons they use to provide a restaurant's service each night: You're going to end up with a lot of dishes featuring copious stacks of lemons, even if ultimately it compromises the quality and organizational health of the establishment. While in some situations you could conceivably roughly compare overall restaurant success by the number of lemons they go through in a period, the comparison only holds true if all else remains equal (e.g. if otherwise the restaurants are very comparable, such as two restaurants serving Thai food): A deli restaurant might use very few lemons despite a healthy customer turnover, where an equally successful Greek restaurant might go through hundreds.
Far more logical would be to measure the number of dishes served -- while still imperfect, it would be much more useful than the LemonMetric. There is no comparable measure, with a similar level of granularity, as "dishes served" in software development (don't even think of mentioning the highly ambiguous "function point" metric as a simile).
"Geez...we all know that there are significant problems with the SLOC metric!" many will inevitably retort. "This is old news. You're preaching to the choir!"
"...but having said that, I saw a recent article that claimed that the average developer does {X} lines of vetted code a year. Are they really that slow? Me and my team must generate at least 20{X} a month! I hear that some superstars are responsible for 200,000 SLOC a year. They must be awesome!"
Comments just like that are probably being typed into a TEXTAREA at this very moment.
Why do so many comments about productivity -- even in the comfort of secret No-PHB hideouts -- inevitably elicit gloating commentary about personal SLOC accomplishments? Why do we hear gushing superlatives about the "superstars who push out 100s of thousands of SLOC a year"?
Why do so many in this industry perpetuate this destructive myth?
Let me flip this metric on its head, and state that, if anything, for a certain domain of project, and a certain class of developer, a high rate of SLOC can actually indicate poor programming practices.
In the nascent days of software development, many teams had a compiler or an interpret and that was pretty much it. They were responsible for building the majority of functionality from scratch. The pace of SLOC creation was tremendous (especially given that much of that implementation was trivial, allowing them to code as fast as they typed. Little time needed to be spent problem solving or planning: It doesn't require a superstar to code yet another string copy function).
As time went on, organizations compiled volumes of reusable internal code for all of their domain specific problems.
From an individual developer perspective, no longer was it acceptable to simply "run and start coding". Now you had to spend some of your time learning, assessing, and implementing shared internal code in your projects.
And it wasn't just inhouse: The frameworks and libraries provided with our tools have been growing by leaps and bounds, immediately solving a huge range of traditional problems and tasks with well tested, robust, feature rich solutions.
In the industry as a whole, code sharing has become widespread, with excellent code being available for virtually all common (and even uncommon) tasks.
So many solutions are available in the industry and supplied within our libraries/frameworks, that even organizational code reuse can be indicative of a problem.
Yet somewhere out there someone is hand-writing an FTP client implementation. Somewhere developers are wasting a tremendous number of man-hours by poorly, and unintentionally, duplicating code that exists in the frameworks and libraries that they're already using, or which can be easily found in license compatible open source projects.
A part of the reason for this is laziness -- it's a real bother having to look through the documentation and amongst search engine results, and that's hardly as much fun as just coding. Another part of the reason is a classic perception flaw that virtually all developers suffer from: Endless optimism about the capabilities and quality of the code we produce -- which we always think we'll finish much quicker than we really will -- coupled with an unreasonable pessimism about the applicability or worth of code we could source from another group in the organization, or from an external source.
I'm often guilty of these failures of perception, as are the overwhelming majority of developers.
Rarely does a developer actually tread across new ground (and I'm certainly not just talking about business back-end "CRUD" developers -- even in signal processing, embedded development, game development, and other less common branches of software development, most of the "solution" is the integration of existing work in novel ways, adding an envelope and façade of customization).
For the rest of us, our job is partly to generate the generally small amount of niche-specific code, usually aiming to build it with the most concise -- aka minimal -- code necessary, with the bulk of our time being in the analysis and integration of the extraordinary volumes of available solutions.
Where niche, custom code is necessary, generally it will be for a non-trivial task, and the SLOC pace will be unavoidably glacial.
For the overwhelming majority of developers in the industry, the only value of SLOC measures is as a warning sign, not an indication of progress.
We've been using Microsoft's Team Foundation Server for version control and basic work item/requirements/bug tracking for about 9 months now. All in all it's a good tool, though really it still feels like a version 0.8 beta that got pushed out the door a little early.
The Good
The Bad
The
application tier is incredibly fragile. If an
enterprising team member decides to do some clean-up directly in
the TfsVersionControl data-tier database (getting around missing
functionality in the tools -- for instance there is no way to
permanently delete, aka destroy, in the tools, remarkably,
so if someone checks in a 500MB file and you want to remove it,
you're forced to do it directly in the database), you will discover
that a single missing related record -- the database doesn't define
or enforce foreign-keys, so it isn't going to block the DELETE
command there -- will cause the application tier to die a
hundred deaths, excepting out on null values and other inanities.
This is made far worse by the fact that the application tier caches
a lot of lookup data, so check-ins/outs will work for a while after
these related records were moved, making a simple database rollback
impossible. Instead you need to go through every database manually
rationalizing all of the data, determining where the application
tier is dying.The Ugly
All in all it's pretty decent, but I think they called it done a little early.
Every now and then I browse over to the popular Coding Horror blog: It's like turning on an episode of the A-Team and collapsing into a comfortable couch on a hot summer day, ice cubes clinking around in a sweaty cup of lemonade.
It's a nice way to zone out, distracting my consciousness from a vexing problem in hopes that a background process running in the Brain.ThreadPool completes and returns a solution.
A recent entry, C# and the Compilation Tax, piqued my interest, reminding me of a prior entry I posted about Edit and Continue (where I discussed the potential over-reliance on automated helpers.)
In his post, the Coding Horror author observes that in actual use .NET has evolved into a single language environment, with C# increasingly dominating (though it's doppelgänger, VB.NET, is only a marginally different variant -- C# is the Lexus ES300 to VB.NET's Toyota Camry).
Who knew? Would you also believe that the Java Virtual Machine can be targeted by languages other than Java?
Cats and dogs getting along! Microsoft and Linux making up (see also: Microsoft the Patent Troll)! Robert Mugabe the economic development champion!
He goes on to note his disappointment that the Visual Studio IDE doesn't support automatic, continuous compilation for C# projects, which would enable it to quickly highlight egregious mistakes in his code.
In the comments to his post, I offered up the opinion that the reliance upon such a piece of functionality might be detrimental to the craft of software development.
Jeff,
I encourage you to go a week with minimal automated code-correctness checks. When you've gotten to the point where you rely upon continuous automated checks (which can miss a tremendous number of problems where the code is syntactically and type correct, but logically broken), you've seriously harmed your craftsmanship.
Dennis Forbes on May 15, 2007 07:10 AM
Mr Atwood replied-
Dennis,
Ah yes, that old logical trap. Better tooling makes us weak! It's a crutch we begin to rely on that cheapens our craft!
This is even debunked all the way back in Mythical Man-Month:
"There is a widespread recognition that debugging is the hard and slow part of system programming, and slow turnaround is the bane of debugging. So the logic of interactive programming seems inexorable."
http://www.codinghorror.com/blog/archives/000026.htmlAnd then, as for the logical fallacy of "easy tools make you weak":
"So let's get real. Bad programmers write bad code. Good programmers write good code. RAD lets bad programmers write bad code faster. RAD does NOT cause good programmers to suddenly start writing bad code."
http://www.codinghorror.com/blog/archives/000090.htmlThis isn't a zero-sum game; better tools let good programmers work faster. The bad programmers, well, it doesn't really matter what you give them because the output will be the same: disappointing.
Logical fallacy...zero-sum...a reference to the Mythical Man-Month...add in "orthogonal" and you've got a Score:5 on your hands.
In between appeals to authority -- Jeff is a big fan of Code Complete and the Mythical Man-Month, always ready to quote them as the indisputable, final word on all matters relating to software development -- Jeff makes the critical, err, "logical fallacy" of presuming that my questioning of his automatic compilation dependency (to avoid assigning literal integers to strings, apparently) is an indictment of all accouterments of software development.
Hardly, Jeff. It might make for a convenient strawman, but it's certainly not the argument being presented.
I'm not calling for a return to punch-card programming, or even back to the days when we'd reference our 50lbs of Visual C++ reference material to lookup the parameters of an API function.
I even think IntelliSense is quite a great little bit of functionality.
When you're randomly typing garbage into your editor, complaining that the continuous invocation of the compilation shortcut is cramping your style (and your fingers), however, you've developed a serious problem.
Sometimes these features are there to coddle a beginner, carefully keeping them within the painted lines and away from the dangerous electrical sockets along the wall. That would explain why it was a more important feature in VB.NET than C#...not that VB.NET is any more trivial -- it's just a syntactic variant -- but it is the language that beginner programmers are generally guided into.
And to his last point: My experience has been that the best developers just naturally start using less and less "helpers", to the extreme where you have incontestably great developers like Linus Torvalds arguing against fundamental helpers like interactive debuggers.
I'm certainly nowhere near as extreme as Mr. Torvalds -- indeed, apparently Linus has softened his stance somewhat -- but I wouldn't dare to claim his experienced perspective "debunked" because Fred Brooks -- operating in an environment where what is defined as "immediacy" would be glacially slow by today's standards, and whose metrics had nothing to do with Edit and Continue or continuous automatic compilation -- can be selectively reinterpreted and applied against a completely different world.
Sorry, Jeff, but I don't buy the infinite monkeys on an infinite number of keyboards model of software development. I can only envision tools like continuous compilation and edit and continue as the hand-holding of beginners, and the crutch of hacks.