The story of Markus Frind is not a new one around development circles: Some guy creates a remarkably unpolished, seemingly unsophisticated dating website and in short order is bragging about the million dollar checks he's getting from Google Adsense payments.
Still, you owe it to yourself to read the article about his exploits in January's Inc. It is a fascinating story of internet success against the odds, and the site that is serving up 1.6 billion pages per month on an inexpensive modicum of hardware. That article references Markus' blog entry from 2006 where he explains how his extraordinary success story began.
[Imagine that I insert some drawn-out blowhard "lesson" to be learned from Mr. Frind's success here, allowing me to justify making an entry that is basically nothing more than a link, all while pretending that if you follow these simple steps you too can achieve the same results]
When doing activities that impact the web site presentation of projects I'm involved with, I occasionally hop down to the menu item "Validate Local HTML" in Firefox, a function that is available when you have the web development tools (you can also access it via Ctrl-Shift-A, and of course can always run it directly, but that seemingly tiny improvement in ease and efficiency of utilization can dramatically increase the usage of it). In a weak sort of TDD, it is a constant sanity test of at least the fundamental HTML validity of the generated presentation, and I always strive to get it to the rewarding green no-errors-no-warnings state.
Does it really matter though? Ultimately what really matters is if the site renders as close to as expected as possible in the major browsers, and most of them happily overlook even egregious errors (Internet Explorer was criticized early on for being so forgiving, but given its dominance the other browsers really had no choice but to allow the same sloppiness. Most web publishers weren't about to re-engineer their site just to ensure that it displayed correctly in Opera, for instance.)
Out of curiousity I decided to check some other sites to see how many ensure that their (X)HTML is clean. The following are the results as they stand at this moment, though of course as content is added or removed the state will change (though a clean site is often a clean site with intention, and new content is automatically filtered to ensure that it is pure).
(I searched around for more good examples to sit in the PASS category, but sadly they are very few and far between)
Should this be normal?
No, it shouldn't.
Some of the errors in some of the mechanically generated HTML are simply unexcusable, and testify to the general level of sloppiness in the web industry in particular.
Check your HTML. Ensure it conforms to the specs it purports to obey, or accept defeat and step back to a less-demanding level. With tools like one keystroke validation and auto-cleanup HTML Tidy (which is available in module form, allowing you to auto-cleanup content mechanically inline in your site code - see this entry for an example of using Tidy from .NET code), there's simply no excuse.
Many will wave off such criticism, declaring that if it renders fine that's what really matters. Yet the worry about purity has more to do with the code maintenance process, and ensuring that an appropriate amount of care and concern is put into the product, in much the same way that you should strive to have 0 warnings in your projects, even if the compiled output works fine regardless. In the same way that I try (albeit with failures at time) to ensure that I avoid misspellings and typos, even if the message could be successfully conveyed with them.