Tuesday, February 21 2006

As a software developer, I'm generally drawn to complexity, and I generally view "hard" problems as much more worthwhile, from a business perspective, than easy problems.

For instance I never considered many of the "successful" .COM-type ideas as worthwhile, and I naturally feel the same about many of those percolating up via Bubble 2.0, because they either seemed too obvious, or too easily duplicated (aside from the fact that most of them haven't a hope in heck of ever having a sustainable revenue model...but that's beside the point -- If you can get Yahoo or Google to buy you out with big dollars, it doesn't matter if your flighty userbase would never pay a penny, and that they would never tolerate a single ad impression).

IMG_3824

As such I discarded a lot of "neat ideas", only to see them bring success to someone else. I was keeping my eye out for something both unique and difficult.

Yet often it's the simplest of things that hold the most value to people.

Just over a year ago, late December 2004, I was working on some custom C++ JPEG parsing logic (scary pointers and all), and noticed that some of my test files were bloated up with a lot of extraneous data. We're talking 4KB JPEGs that had an extra 60KB+ of data appended to them. Apart from EXIF data -- useful at times, but completely irrelevant at other times -- Adobe Photoshop was particularly notorious for stuffing files full of worthless application blocks.

Many of the images you find on the web have all of this extraneous info, slowing transfer times, increasing server loads, hiking bandwidth bills, causing pestilence and suffering (okay I'm going a little overboard).

As such, I gathered up a ridiculously small amount of the parsing code, compiled it, and "released" it as the rashly named PureJPEG. I expected it would see maybe a dozen downloads from nitpicking webmaster looking to ultra-optimize their user's experience.

Over the past year, over 40,000 people have downloaded this utility directly from yafla. It's been mirrored on quite a few sites (particularly in Russia, for some reason), so I have no idea how many worldwide have downloaded it, but presumably a greater number still.

I would have never imagined that something so simple would have filled such a niche, and I'm a little embarrassed about how trivial of a micro-project it really was, but there it is. As far as little utilities go, it's been a stunning success.

   
Tuesday, February 21 2006

[The following repost of "legacy" yafla content is preparation for the long awaited publishing of part III, which will be published through this medium. To give consistency, I'm reposting Pt I and II in this format]

Examples given in this series will reference the sample database "Northwind". It can be installed via the script found on the Microsoft website

The Basics? We're All Professionals Here!

Much of the material covered in this outing will be old hat for a lot of developers, but is nonetheless worth a recap -- Even among the pros there remains misunderstandings and conflicting information about the fundamentals of databases, and the true magnitude of impact they have on systems. I've intentionally authored this series conversationally, as opposed to a "high impact hit list!", however if you'd just like a brief summary list you can find one at the end.

Data Size

This needs to be addressed as there is a growing camp of "data wasters" that erroneously believe that the larger the amount of waste, the more Enterprise Ready a solution is (capitalization used derisively). I step into this quagmire knowing full well that this section will yield me some "you're a dummy!" responses from some newly earned adversaries, however that's a price I'm willing to pay if I can save but one byte tree.

Minimize the size of your data. Don't use GUIDs where they aren't necessary (e.g. where you don't really need global uniqueness/replication), and don't use a bigint where an int or a smallint will suffice. Don't use a smallint where a tinyint will suffice or an nvarchar where a varchar would be fine. Use the smallest type that is reasonable for the field. Don't invent vague packing technicalities or native type size issues in an attempt to justify oversized data.

Of course you should plan for realistic growth, and I'm not advocating that you use a tinyint to store your CustomerID field, but keep the rational, real world in mind when designing your applications - are you really going to exceed 2 billion users? Is there going to be more than 32767 languages in your application? Is it likely that we're going to a new calendaring system that might have 2 billion months?

Evaluate if it might be an acceptable compromise to simply use these large types in your façade while actually storing smaller types in the actual database. This would give you improved performance, and would allow you to easily upsize your data types in the future in the unlikely event that it becomes necessary.

Clearly there are cases where large data types are legimately warranted, however too many database architects abscond themselves of any responsibility for efficiency by making everything a GUID or a bigint "just in case" (GUIDs have a substantial creation cost as well, in addition to the obvious storage and I/O costs. While GUIDs once used the available network card MAC address as the foundation and generally sequentially increased in value on each new GUID, in current Windows variants GUIDs are basically random numbers -- used as a clustered primary key they can lead to endless data reordering. NOTE: See Sequential GUIDs in SQL Server for solutions for this problem).

Why does it matter? In real enterprise apps before you know it there are tens or hundreds of millions of rows throughout your database, and these rows need to be read from, and written to, the glacially slow storage subsystem constantly -- given this finite resource, doesn't it seem logical that 1MB of I/O carrying 20,000 records is better than 1MB of I/O carrying only 5,000 records? Of course it is. Isn't it better that 10% of your database can fit in the memory cache rather than just 5%? Of course.

Don't be lulled into a false justification of large data types by running ridiculously small benchmarks, where all of the data exists in the memory cache and the I/O is dwarfed by the computational element of the query, yielding "only" a performance hit of 10% or so with larger types -- when your database gets to real enterprise size, size really does matter. That cluster indexed GUID primary key not only makes the row bigger, it makes every non-clustered index bigger (and thus slower) as well, and when the weakest link SAN is running at 100%, you'll regret every wasteful byte.

Indexes

Many SQL Server performance problems are rooted in missing or inappropriate indexes, or alternately unused indexes. This is often true for databases thrown together by front-end experts unhappily tasked with supplying the back-end database, just as it's often the case for those carefully crafted by highly-focused database professionals.

An understanding of indexes, and a focus on their application, is paramount for high performance databases. Not only is it critical to create the right indexes, it's important to craft your access to properly utilize the indexes that are there.

A Bookshelf Full of Examples

Non-Clustered Indexes

Indexes in reference books serve the same purpose (and share many of the same traits) as those in the database world -- by referencing the index you can follow a shortcut to a particular piece of information, seeking directly to that specific page, versus going from page to page scanning the contents. In the SQL Server world these sorts of indexes are called non-clustered indexes (or secondary indexes) -- they are a subset of the data ordered for a specific purpose, containing a pointer to where the real data row can be found.

In the Northwind database an example of a non-clustered index is ShipPostalCode on the Orders table. This index sorts by the ShipPostalCode, and may be used for a query such as the following.

SELECT * FROM Orders WHERE ShipPostalCode = '05022'

If you take a look at the execution plan (Ctrl-K or choose the option "Show Execution Plan" under the Query drop-down in Query Analyzer. When you execute a query with this option enabled a new tab, Execution Plan, will appear beside the results tab), you can see that an index seek took place, and then a bookmark lookup to find the actual data row. If we run SET STATISTICS IO ON on the connection before running the above query, we'll get some statistics on the IO used to satisfy the query, which will be as follows.

Table 'Orders'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0.

Compare this to the following query, which simulates having no index.

SELECT * FROM Orders WITH(INDEX(0)) WHERE ShipPostalCode = '05022'

In this case the execution shows a full table scan, and our IO statistics reports the following.

Table 'Orders'. Scan count 1, logical reads 21, physical reads 0, read-ahead reads 0.

Without the index there was significantly more I/O, and the differential would be vastly worse if this were a large enterprise-sized table instead of a small sample table. To make matters much, much worse such a table scan will block on every single exclusive page or row lock on the table, waiting for the data to be unlocked just to be able to verify if the wanted information is contained within, while the index seek knows that the locked data isn't the data that it's looking for and is unaffected. Try running the above two queries in two separate query analyzer windows while the following script is running in yet another (increase the WAITFOR delay if you can't jump between them all within 30 seconds).

BEGIN TRAN

UPDATE Orders SET OrderDate = '1997-08-27'
WHERE OrderID = 10647

WAITFOR DELAY '00:00:30'

ROLLBACK TRAN

The first query, using the index, instantly returns the result regardless of the row lock, which is logical given that the row being locked is not what the query is looking for, while the second query, not using the index (which can happen because no index exists, or the index isn't deemed the best choice) blocks until the other connection's lock is released. Not using an index is not only vastly less efficient, it can significantly worsen blocking problems as a database scales (or rather tries to scale).

Clustered Indexes

Returning to our analogy, many books take it a step further and order the content, making the data itself an index of sorts. A cookbook might sort by main category and then dish name, while a phone book famously sorts by [city, last name, first name] (in that order). Thus if you want to search based upon the sorted data, it's extremely efficient -- in the example of the phone book you can very quickly seek to the desired city and last name, scanning a small number of records for the desired person. In the SQL Server world this sort of index is called a clustered index (the sorting of the data itself), and for obvious reasons you can only have one clustered index in a book, or on a table. The primary benefit of a clustered index is that all of the table data is immediately available for every index match -- no dereferencing is necessary.

In the Northwind database consider the following query.

SELECT * FROM Orders JOIN [Order Details] ON Orders.OrderID = [Order Details].OrderID WHERE Orders.ShipPostalCode = '05022'

If you look at the execution plan, the Order Details data is grabbed via a very efficient clustered index seek.

Clustered indexes aren't all milk and honey, though. For instance imagine that you're the hard working typesetter maintaining the layout of the phone book, and you've carefully arranged all of the entries on the respective pages. Every time a new entry comes in and doesn't coincidentally fit right at the end of the sort order, or someone changes the information on an existing row in a way that alters the sort order ("Smith" changes his name to "Jones"), you need to reorganize some pages to make space. This same data-churn problem occurs with both non-clustered and clustered indexes, but clustered indexes exacerbate the problem given that it contains the entirety of the row data.

Of course you could plan for this by keeping a bit of blank space on each of your pages to facilitate at least a couple of changes, which is what the fillfactor is used for in SQL Server (a lower fillfactor leaves more empty space but reduces the true data density - insert performance is improved, but read performance is diminished. A high fill factor increases the real data density and thus read performance, but increases the likelihood of inserts requiring page splits. Note that fillfactor only applies on index creation, and whenever you defragment/rebuild your indexes), however this can be a serious performance issue for out-of-order inserted data, or frequently changing cluster indexed fields. It's for this reason that many developers use the monotonically increasing identity field as their cluster index. Historically there was a worry that having multiple inserts all going to the same "place" in the index, at the end, would lead to scalability killing contention at this hot spot. SQL Server has logic to deal with identity fields and effectively eliminates this hot spot issue.

Another problem with clustered indexes is their girth (they contain the entire row data). This is largely irrelevant if you're seeking to specific records, or where you actually plan on using all of the data after a lookup, however if you are querying a range of data (for instance the first name of all of the people with the last name "Forbes" in the city of "Oakville") the query engine will read in the entire row contents for each matching record using a range scan, extracting only the requested data. In our example phone book there is so little extra data that it's a minor overhead, however in many large real-world tables this can have a serious performance impact.

Consider if instead we had a secondary index that was sorted by City, Last Name, First Name, the query engine could very efficiently scan past only the small index entries.

Covering Indexes

This brings up a very important point -- Some indexes contain enough information that you don't even need to go to the content, your query being satisfied by the index itself. Consider an index in a tour book that sorts famous attractions by their name, and the country and city that they were located in, pointing to the page where further information regarding it could be found. If you just want to know what city the Accademia dell' Arte del Disegno in Italy is found in, a quick seek through the index will tell you that it's in Florence. In this instance the index was a "covering index", in that it fully covered our request and we didn't need to dereference to the complete topical information. This is often the most efficient query mechanism of all.

Consider the following variation of a query we ran earlier.

SELECT OrderID FROM Orders WHERE ShipPostalCode = '05022'

This will efficiently use the ShipPostalCode index to find the specific record, and because the query is fully satisfied by the index itself, the costly bookmark lookup is avoided, and IO is minimal.

Table 'Orders'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0.

The observant will note that OrderID isn't actually in the index ShipPostalCode, or at least it doesn't appear to be. The trick is that all of the fields that are the sort fields for the clustered index, if one is defined, are automatically added as data fields to every other index on the table. This can be blessing in cases where you want one of the clustered fields and suddenly the non-clustered index is a covering index, but it also needs to be weighed against the fact that it makes every other index larger, and thus less "data dense".

Small covering index seeks are the most efficient method of pulling data, and it's a good reason to ensure that you are only pulling the specific fields that you actually need from any given table, preferably with a covering index. Range scans are also highly efficient in many situations, and are usually used when the wanted entries in an index are consecutive, such as when you search BETWEEN two dates against a date index, though because of the previously mentioned bookmark lookup costs range scans are generally only seen if the index is fully covering, or against the clustered index.

Even in the case where a full scan is necessary, indexes might still be fully covering. Consider if the following index were added to the Customer table.

CREATE INDEX CountryCity ON Customers(Country,City,Address)

This index of course contains Country, City, Address, but as mentioned above it also contains CustomerID because it's in the clustered index. Of course if we query on Country, or Country and City, or Country, City, and Address, an efficient seek or range scan might be used to pull the matching records.

SELECT CustomerID FROM Customers WHERE Country = 'Canada'

What if instead we wanted to search only on the address? In that case the index can't be searched in order because it sorts by country and then city and address, and thus a particular address could exist anywhere in the index.

SELECT CustomerID FROM Customers WHERE Address = '43 rue St. Laurent'

You might be surprised to see that it still used the index, albeit this time it's an inefficient scan rather than a seek. The index was used because it covered the query (all predicates and returned columns), and because the index is only a subset of the data it requires less I/O to scan the entire index than it does to scan the entire table data.

Statistics and Bookmarks

In a prior example we ran a query that required a bookmark lookup to satisfy the query (thus it did not have a covering index). The query was as follows.

SELECT * FROM Orders WHERE ShipPostalCode = '05022'

If you look at the execution plan for this query you can see that it seeks the "lookups" in the index, and then does a bookmark lookup against the clustered index (which is the actual table data). In this case there is only a single row to return, but even still the bookmark lookup cost is estimated to account for 50% of the cost of the query.

The cost of bookmark lookups, where an item is found in the index but it isn't a covering index, is the reason why many people are surprised to find that SQL Server has ignored what they believe are perfect indexes and instead table scanned ("Why isn't it using my index! ARGHHH!!!"). Consider the following query.

SELECT * FROM Orders WHERE ShipPostalCode = '24100'

Looking at the query plan you can see that it actually did a cluster index scan (which is a table scan on a table with a clustered index) instead of using our index, and the subtree cost is 0.0530.

This might seem perplexing because we seem to have a perfectly satisfactory index, however let's do the same query again, this time using a query hint to force it to use our index.

SELECT * FROM Orders WITH(INDEX(ShipPostalCode)) WHERE ShipPostalCode = '24100'

If you look at the query plan you can see that it used our index, as demanded, but this time the bookmark lookups account for 80% of the query time. Our subtree cost comes in at 0.0564 -- more than it was doing a table scan!

In this case this was only 10 records of a total of 830 (1.2%) yet still it opted to do a full table scan rather than using our index. Many developers have been perplexed in this situation, wondering why SQL Server was avoiding their beautiful index, but it's doing it for a very valuable reason - it was cheaper than indirectly looking up each piece of data through bookmarks.

Of course we could have avoided bookmark lookups by using the index as a covering index if possible by using a query like the following, presuming this was all the data we needed to extract from the table.

SELECT OrderID FROM Orders WHERE ShipPostalCode = '24100'

Now it uses our index, is super efficient, and has a subtree cost of only 0.0064. In larger databases the difference can be the tremendous by avoiding both the bookmark lookup and the table scan.

So how did the query engine guess how many rows would match a criteria to choose which method (whether by index and bookmark lookup, or table scan) to use to most efficiently satisfy the query? That's where something called distribution statistics comes into play. Statistics are a representative set of the data that are used by the query engine to make a best-guess plan for how to most efficiently serve the data. You can view the statistics for a given index, in the following case for the index ShipPostalCode, via the DBCC SHOW_STATISTICS command.

DBCC SHOW_STATISTICS('Orders','ShipPostalCode')

Due to the limited number of discrete ShipPostalCode values in the table, the statistics are entirely accurate in this case. In a more realistic database, with thousands or millions of rows, statistics start to become much more of an estimation (with an ever increasing margin of error). These estimations can lead to entirely wrong assumptions by the query engine in some edge cases, such as where it thinks a given set of predicates will yield thousands of rows when really it might yield only a couple.

Statistics can also fail for multi-field indexes. In this case the selectivity of the first field is used, so in the case of the index we created earlier (Country, City, Address), due to the fact that the country has a low selectivity (there are lots of entries for each country), the index will often be ignored, even though the city and address combination is highly unique. For this reason it is generally recommended that the most selective field comes first in your index, so in the case of that index the fields would be Address, City, and then Country. This is debateable because it also makes the index more single purpose -- it no longer serves an efficient purpose for less-granular searches like just Country, or Country/City. This needs to be evaluated on a case by case basis, and truly wouldn't be an issue for fully-normalized tables.

It should also be noted that ensuring that your statistics are as accurate as possible is critical. SQL Server includes automatic statistic updates, on by default, where it will attempt to do data sampling and update statistics when it feels they are out of date. Nonetheless it is a best practice to schedule full statistic updating at regular intervals (at a minimum weekly), preferably using the WITH FULLSCAN option so it is as accurate as possible. The standard database maintenance plan includes a step for statistic updating, and allows you to choose the amount of data to sample.

Regardless of all of the above, there will be cases where you may find that your statistics are up to date, your indexes are optimal, yet SQL Server is still incorrectly choosing not to use your index. In this case it may be an unfortunate reality that an index hint needs to be added to the query to politely (or rather sternly) request that it reconsider. Obviously this should be a last resort.

Actually Using Indexes

So you've created beautiful indexes, and you've ensured that your query only pulls the necessary data from each table, using covering indexes where possible to avoid costly bookmark lookups. You pull up the execution plan to find…that the query engine is entirely ignoring your index. There are several reasons why this could happen.

Consider the following query.

SELECT OrderID FROM Orders WHERE LEFT(ShipPostalCode,4) = '0502'

Fairly simple query, and from the looks of it one might think it'd be an efficient covered index seek. Upon execution you'll discover that actually it was an inefficient scan. Consider the following instead.

SELECT OrderID FROM Orders WHERE ShipPostalCode LIKE '0502%'

In this case the query is executed as a highly efficient index seek. I have had cases where this tiny difference reduced an enterprise report from running for literally hours to a matter of seconds.

The reason is that in the former the indexed field was hidden within a function. The query engine can't predict what the result of the function will be, so it's forced to evaluate it for every row to see what pops out. LIKE is a first class comparison, it knows how it behaves, so the query engine can actually optimize against it. There are countless cases where people hide criteria fields in functions unnecessarily, and the result is massive, unexpected inefficiency.

The most common example of this mistake is using DATEADD/DATEDIFF to pull rows within a certain period of time -- instead of pre-calculated a fixed demarcation (i.e. precalculating GetDate() - 3 years) and then doing a direct comparison with the row data, developers are forcing whole table scans with wasteful date computations on every single row. For instance consider a query to report news items that have occurred within the past 12 hours from a hypothetical news table .

DECLARE @CurrentTime

SET @CurrentTime = GetDate()

SELECT * FROM NewsStories WHERE DATEDIFF(hh,NewsDate,@CurrentTime)<12

Guaranteed to be terribly inefficient, yet it's overwhelmingly common. The query engine can much more effectively optimize the following variant.

DECLARE @StartTime

SET @StartTime = DATEADD(hh,-12,GetDate())

SELECT * FROM NewsStories WHERE NewsDate > @StartTime

Database Cheat Sheet

Indexed Computed Columns

At the outset I advocated that you minimize space usage (increasing real data density). The goal wasn't to try to fit that database on a floppy disk, but rather to minimize the amount of I/O necessary to satisfy a given query, as I/O is the weakest link of most enterprise systems. There are design choices, such as adding additional indexes, that actually increase the size of your database on disk yet reduce the I/O necessary for certain queries, and these are usually very worthwhile trade-offs.

Another powerful technique you can use to trade disk space for improved database performance is indexed calculated columns. There are countless variations, but I'll cover one scenario that is fairly commonly used -- report counts by month for a given year. In the case of the Orders table this could be achieved via the following query.

SELECT YEAR(OrderDate) AS [Year], MONTH(OrderDate) AS [Month], COUNT(*) AS [Monthly Orders]
FROM Orders
WHERE YEAR(OrderDate)=1997
GROUP BY YEAR(OrderDate), MONTH(OrderDate)

Instead of adhoc decomposing the date into month and year constituents, consider adding them as computed columns.

ALTER TABLE dbo.Orders ADD
OrderDateYear AS CONVERT(smallint,YEAR(OrderDate)),
OrderDateMonth AS CONVERT(tinyint,MONTH(OrderDate))

Now we can change our query to the following.

SELECT OrderDateYear AS [Year], OrderDateMonth AS [Month], COUNT(*) AS [Monthly Orders]
FROM Orders
WHERE OrderDateYear=1997
GROUP BY OrderDateYear, OrderDateMonth

By itself we've done nothing for the query efficiency (in fact it is actually less efficient as it's applying the where predicate after building the set), though we've achieved a bit of "code re-use". However we now have the foundations for some powerful indexed computed columns.

CREATE NONCLUSTERED INDEX IX_OrderDateDecomposed ON dbo.Orders
(
OrderDateYear,
OrderDateMonth
) ON [PRIMARY]
GO

Now the query referencing these computed columns is dramatically more efficient. Even better, these indexed computed columns haven't decreased the real data density of the table because they're only materialized in the index. NOTE: Ensure that queries that don't need these computed fields don't pull them explicitly or implicitly via the wasteful * column selector, as it'll unnecessarily calculate each of the computed fields for each row.

Indexed Views

[This functionality only exists in the Enterprise and Developer edition of SQL Server 2000]

An indexed view, sometimes referred to as a materialized view, is a sort of indexed computed columns on steroids, taking the idea of storing computed results to the next level. In a previous example we improved the performance of some data aggregation logic by grouping on some indexed computed columns. We can take a variant of that and create a view out of it.

CREATE VIEW dbo.OrdersByMonth
WITH SCHEMABINDING
AS
SELECT OrderDateYear AS [Year], OrderDateMonth AS [Month], COUNT_BIG(*) AS [Monthly Orders]
FROM dbo.Orders
GROUP BY OrderDateYear, OrderDateMonth

As it is the view is acting as nothing more than a template for queries against the underlying table, and is only of benefit for code reuse (which itself is a very worthwhile goal). We can take that a step further by materializing this view so the actual results are stored, and changes to the underlying table is automatically reflected in the aggregates. The following command creates the indexed view.

CREATE UNIQUE CLUSTERED INDEX IX_OrdersByMonth ON OrdersByMonth(Year,Month)

Now the following query will be satisfied by the indexed view, using the pre-computed values, rather than recalculating for every query.

SELECT [Year],[Month],[Monthly Orders] FROM OrdersByMonth WHERE Year=1997

In the end the resource usage for our monthly order count query has dropped by about 90% over the original query, and this is for a tiny sample database. In the real-world the differential can be extraordinary.

Indexed views do have some downsides, such as the automatic maintenance that occurs whenever the underlying data changes, however they can be an extraordinarily powerful tool in your arsenal and deserve further research if your platform supports it.

Summary

  • Keep your rows as small as possible to maximize real data density.
  • All tables should have a clustered index except in rare exception situations. Normally this will be a single field primary key.
  • Small clustered indexes keep non-clustered indexes small, increasing real data density.
  • Clustered indexes help make other indexes covering indexes.
  • Queries serviced by covering indexes are extremely efficient.
  • Avoid hiding criteria fields in functions -- indexes will not be used for them.
  • Consider indexed computed columns where appropriate
  • If you shelled out the cash for Enterprise Edition, seriously evaluate how indexed views fit in your solutions
  • Index, index, index! Only in extremely rare cases are the additional update and insertion costs associated with maintaining indexes heavier than the benefits.
  • Understand execution plans. Evaluate them regularly.
 SQL 
   
Monday, February 20 2006

[The following repost of "legacy" yafla content is preparation for the long awaited publishing of part III, which will be published through this medium. To give consistency, I'm reposting Pt I and II in this format]

def. Enterprise
adj. Terminology often used to excuse terribly inefficient software designs, and to justify massive hardware overkills for trivial solutions. e.g. Enterprise solution.

Introduction

Inefficiency is a gluttonous thief. It burglarizes your server rooms at all hours of the day and night, demanding virtually limitless hardware sacrifices to satiate its endless thirst for clock cycles and disk rotations. In return it punishes your users with reduced performance and reduced satisfaction, and devastates your solution's scalability.

This inefficiency, materialized in the form of slow performance, is one of the primary causes of system abandonment. This is particularly troublesome in the SQL Server world where many systems servicing large user bases often run on low cost server boxes that leave little margin for performance waste. Many organizations have tossed out their SQL Server solution running on a $3000 PC because the performance wasn't satisfactory (not achieving so-called 'Enterprise' performance), to replace it with a multi-million dollar mainframe solution, overcoming embarrassing inefficiency with brute force.

Reward

Several years back, in a moment of nerdish bravado, I made a foolish blanket statement that I could reduce the runtime of virtually any element of a non-trivial SQL Server database solution by 95% (thus improving the performance by about 20x), doing so through some rudimentary changes requiring nothing more than some analysis, minor code changes (changing the underlying code, but not the functionality), indexing, and file group changes. To my surprise, and even greater dismay, this number actually proved to be remarkably accurate: From giant multi-hour organization wide reports, to simple security procedures run hundreds of times a minute, the obvious low hanging fruit alone often improved performance by 10x or more. With a little bit of elbow grease it has proven extraordinarily common to improve performance by 20x or more, significantly improving responsiveness and load handling of the respective systems at minimal cost.

The remarkable thing is that these weren't systems implemented by bad developers - many of them were extraordinary developers who implemented a lot of tricks and techniques that I've co-opted and added to my own bag of techniques. Instead there seems to be a dearth of real information on developing for performance in SQL Server, leaving many to guess about the best approach, not to mention that there isn't enough attention paid to performance efficiency in enterprise solutions. Many seem to be under the false impression that gross inefficiency requiring massive clusters to perform trivial tasks merits a capital-E Enterprise designation.

Motivation

In software development there's an oft-referenced vice known as 'premature optimization'. This is the tendency to prematurely focus on code performance while code is still young and awkwardly growing, before the critical performance weaknesses have been identified and measured. The end result of this misguided effort is often convoluted code that is difficult to understand and maintain (for instance code including inline assembly or using specialized system hacks in seldom called edge functions). This is often a mistake of inexperienced programmers that haven't had the perfectionist engineering streak beaten out of them.

Consider also that performance truly isn't a concern for the vast majority of code in most client-side applications - it likely doesn't matter if the code that validates an input box in a Windows Forms application takes 3ms or 70ms to complete. As the processing is decentralized and isn't impacting other users who might be running the application elsewhere on the planet, it is basically making use of 'free' clock cycles available on the client PC, and generally is imperceptible to the user. If one thousand different users were running the application simultaneously, they're running it on a thousand powerful PCs, effectively throwing a massive 'cluster' at the problem. In other words, you can overcome application inefficiency on the client side through massive computational excess and a endless ability to scale-out. Even in cases where there is worthwhile performance issues identified, for example an image processing algorithms that takes several seconds to perform an operation, it's often best to wait until the project nears a release and the code has settled, at which point you can send a commando performance team to profile and then selectively improve the slowest sections of code that will have the most beneficial impact, focusing on the lowest hanging fruit, yielding a bounty of quick wins. (Taking one for the team because there's no I in team, and no cliche unworthy)

Enterprise databases, or any centralized system for that matter, are entirely different beasts - performance is one of the critical elements of these systems, and performance problems are one of the primary reasons why solutions are abandoned or re-architected. Consider that every clock cycle wasted on a shared resource, such as a database server, impacts the performance of the overall system and every other user. In most environments there is a massive asymmetry between the computational capability of client machines, and the computational capability of a shared system, such as a database server. There are usually some fixed financial and technological limits to the amount of hardware that a system can scale to, so your database server running on a lowly Dell two-way server is desperately trying to keep up with the demands of 500 user workstations pounding away at it. Even though Google is clustered on purportedly thousands of machines, they still have to develop efficiently to be able to economically service millions of users in a timely manner.

Thus, while it might seem irrelevant when taken alone that your stored procedure saturates the resource, taking 200ms to return a simple list of values to populate a drop list for Joe User, imagine 100 users all opening that form at the same time putting a shared demand on the database system. The performance impact starts to become significant and adversely affects the usability (and credibility) of the system. This is exacerbated by the fact that simultaneous performance demands aren't merely additive on shared resources, but rather contention and task sharing often means that these issues snowball into much more than the sum of the parts.

Wisdom

You should consider the performance of your database from day one with every table you add, every index you create, every trigger you concoct, and every relationship you define. While the misguided will argue that this amounts to premature optimization (as Ralph Waldo Emerson observed, a foolish consistency is the hobgolbin of little minds, and the belief that any performance concerns are premature is just such a foolish consistency), the reality is that the performance of a database system is largely defined by the fundamental design of the system, and as the system grows it becomes much more difficult and costly to solve fundamental performance problems. Furthermore, once an enterprise system reaches production even the simplest performance change, such as adding an index, requires complex analysis to determine how it impacts other parts of the system, or that it satisfies what could be hundreds of procedures accessing the object.

The cynical will wonder how one can predict the future when designing a database system, but the reality is that the access patterns are usually obvious by the time you're starting designing tables - you know how the tables relate, what data will be searched, how often you'll be selecting the records versus modifying them, and how big the fields and records should be. Use this information effectively when developing the tables to choose the appropriate clustered and secondary indexes, to minimize the size of each record, and to write efficient SQL. Don't leave it for a maintenance programmer to reverse engineer the system and apply best guesses in a moment of crisis in the future.

Agenda

Part II and III will introduce a variety of common performance pitfalls and panaceas in the SQL Server world, touching upon (but not limited to) the following:

  • Indexes - clustered and non-clustered
  • Fundamental table designs
  • Filegroups
  • Cursors
  • Materialized views, computed columns
  • Common Performance Problems
  • Surprizing SQL Server behaviours

Tagged: [], []

 SQL 
   
Friday, February 17 2006

I've received some great feedback regarding the entry on setting up a MediaWiki install on Windows. Many of the comments were kind words of thanks (which I really appreciate. Knowing that it helps people is my greatest motivation), and others helpfully suggested improvements to the instructions.

As an example of comment-driven improvements, my instructions have you installing the GNU diff utilities, in particular for the diff3.exe utility, however the MediaWiki setup scripts don't properly find it (e.g. as the instructions are currently written the GNU diff utilities are completely unused, although they can still be useful in your day-to-day travails). This is because a prior revision included fairly involved changes to the MediaWiki config/index.php script so it would properly locate diff3 on the Windows platform, as it is currently Unix-centric and doesn't look for the proper executable, not to mention that it parses the PATH environment variable incorrectly . After receiving two comments that those steps were a little too complex, however, I removed that section.

My goal was to get people experimenting with MediaWiki, or even just wikis in general, so diff3 functionality really wasn't critical. I pared the instructions accordingly. Similarly one early draft included the building and installation of a PHP memory cache to improve performance, but that too is unnecessary to simply try out the product.

Another line of comments involved asking:

  • Why would I give instructions for Windows. People should just set it up on Linux and go with its native home.
  • -or- Why would I recommend a wiki product that largely caters to the open source crowd. Instead I should be pushing Sharepoint, or something properly anointed by the Microsoft camp, enabled with all of the latest Microsoft buzzwords.

To answer this I really need to describe the philosophy of this blog, along with my resistance to "technology alliances".

In the byline of this blog I describe my philosophy as "pragmatic software development", and this really drives my recommendations. In this case there are a lot of development shops that are Windows-centric, with little or no UNIX/Linux experience, yet MediaWiki is one of the best, most featurer rich, "standard" wiki products out there. Choosing a solution that leveraged what shops already know with the best solution is a pragmatic approach.

Which brings me to my general philosophy towards Microsoft, as comments indicating that I'm either a Microsoft hater, or a Microsoft drone parroting the corporate line, have hit my inbox over the short history of this blog.

I am not subservient to Microsoft.

Unlike many Microsoft technology advocates (I truly love both SQL Server, and .NET, and I think they're remarkable solutions), I have no desire to ever work for Microsoft (Microsoft has some top notch, world-class talent, and I've met and worked with a lot of great talent from there, but they also have their share of both jerks and duds). I'm not going to praise their every move in hopes that I'll get noticed. yafla, my consulting/ISV company, has chosen to avoid any partnerships or tying to the Microsoft brand because we don't want to become another drone "consulting" company single-mindedly acting as a third-party sales force for Microsoft, desperately racking up Microsoft partner points by pushing less-than-optimal solutions on customers. We didn't choose to use .NET for our software because we're hoping to nestle into the Microsoft family -- we chose it on technical merit, and a pragmatic analysis of our current and prospective clients.

We work for our clients and ourselves, not Microsoft. This is a very important mantra for our services, and for the technology of our software, and if Microsoft wants their products to get recommended to our clients, and their technology to the foundation of our software, they need to make great products at competitive prices. No sales gladhanding, or sad career dreaming, is going to change that.

Am I saying that Microsoft solutions are second rate? Of course there are examples of Microsoft products that are terrible, and customers are being misled into buying buzzword-laden atrocities because a Microsoft partner is hoping to get invited to the next Microsoft dinner party. Yet there are also Microsoft solutions that are extraordinary. Windows 2003R2 is a superlative operating system, and where you need the breadth of its functionality, it can be well worth the money. Microsoft Small Business Server can be an amazing package of value for some small organizations, within the constraints of the product. Other times, however, if you have the appropriate skills, a Linux machine is the best choice, along with a stack of the many available free or close to free server products on that platform. Sometimes IIS 6 is the superior solution for a problem, while other times Apache would be your best bet. Sometimes PHP and MySQL is a great solution, and other times C#/ASP.NET with SQL Server is the perfect combo.

I don't blindly assume the Microsoft product to be the best, but neither do I automatically presume it to be second rate. Instead I evaluate on merit, and propose solutions based upon the customer and their needs.

To do otherwise would be just biased noise, and wouldn't be to the service of clients and peers.

Tagged: [], [], []

   
Friday, February 17 2006

Dale Begg-Smith, a Canadian-born moguls skier who emigrated to Australia several years back (becoming a citizen there), purportedly to take advantage of their much more relaxed national ski program, won a gold medal at the Olympics yesterday. Congratulations to Mr. Begg-Smith on the uncontestably extraordinary performance.

What makes Mr. Begg-Smith more interesting to those of us in this profession, however, is his off-the-slopes career path.

If reports are true, he somehow managed to bag tens of millions of dollars of net worth in the internet game over the past several years -- the sort of things that get every other technology worker's spouse asking "Why can't you do that?" Much like the .com bubble, it gives the perception to some that there's a tremendous wad of easy money just floating around, waiting to be grabbed from the ether.

What really draws one's attention, however, is how vague Mr. Begg-Smith is about what, financially at least, was a very successful business venture. At 21 years old he appears to have accumulated more wealth than most will in their lifetime, and he's pursuing his dreams and darting around in Lamborghinis. I've seen him described as a whiz-kid (he apparently dropped out of school at 14 [!]), an internet mogul, an internet tycoon, an internet genius, and virtually every other "gee whiz!" descriptor that the media pulls out when someone does well (by skill, or by luck) with technology. Yet he's entirely secretive about what his business did, what its current state is, how big it is/was, and so on.

In the same situation most of us would be bragging unstoppably to all that could hear about our business prowess.

The reason for the secrecy, apparently, is the nature of the business. If reports are true, Mr. Begg-Smith and his brother made their fortune through the less savoury side of the net. The side that most of us would never consider (which is why spammers and adware/spyware perpetrators manage to make so much money: There's a lot of demand for their services, but very few who are willing to provide it), technical capacity or not.

In any case, a lot of the backlash seems to be founded in envy, which is sad. The guy financially did very well for himself, and is demonstratably a world class athlete. All in all a pretty remarkable accomplishment for 21.

   
Thursday, February 16 2006

Introduction

This article describes the wonder and curiosity that many developers start out with, whether it's when they entered their first Compute! type-in program on their Atari 400, picked up their first JavaScript in 1 Hour book, when they started toying with the gcc compiler for the first time, or when they began towards their first Computer Science degree in university.

It also describes how that natural enthusiasm can be crushed, and how it can hopefully be regained or maintained.

This is written for the developer, whether a new recruit or a veteran, motivated or unmotivated, spirited or crushed, yet it's also written for software development managers (who might identify how to make the workplace more enjoyable and more rewarding).

Like most entries of this genre (see also Optimal Software Development Processes and Practices) I selected a small list of widely applicable, but often overlooked, factors. This most certainly isn't exhaustive, but hopefully it leads to a bit of reflection.

The Enjoyable Profession of Software Development

Software development can be a tremendously rewarding, enjoyable career.

Few careers offer comparable opportunities to weave intricate, complex structures that, while virtual, have such a positive impact on the world around them. Few offer the freedom and creativity that software development does, or the very real potential for entrepreneurial riches.

Whether it's building a new peer-to-peer application, control software for a massive power generator, or improving the workflow of the corporate scorecard system, done right this can be a very fulfilling, enjoyable, challenging pursuit.

A Passion for Software Development

Does your mind race at all hours, abuzz with potential solutions for vexing software development challenges? Do you lie awake at night -- anxious like a preschooler on Christmas Eve -- eager for morning to arrive so you can implement the crafty coding structures you just thought up? Do you frequently find yourself powering up your system in the twilight hours to implement the fruits of an epiphany?

Or do you put in just enough face time and superficial effort that sacrifice makes up for undelivered results? Do you purge your mind of software development the moment the virtual end-of-day whistle goes off, sliding off your Aeron dinosaur satisfied that it's one day closer to the weekend? Do you dread Mondays, motivating yourself to keep going with the dream of a far off vacation?

Do you eagerly embrace new technologies, seeing it as a challenging opportunity to learn something new when a solution calls for a new skill? Would you voluntarily dive into the innards of the Firefox web browser if a solution demanded it and you'd never touched it before? Do you swim through documentation, thirstily absorbing new APIs, tools, and languages to expand your skill-set, eagerly embracing industry advances?

Or do you dread anything different, praying that you're tasked with challenges that require only the skills you've long held, allowing you to apply them in a mechanical, repetitious fashion? Do you hope every project is an echo of a prior project? Do you put off any task requiring research, and show disdain towards new languages, techniques and practices, hoping that they don't gain traction?

Are you really passionate about software development? Be honest with yourself.

A desire to outshine a teammate isn't passion. Nor is a motivation to impress the boss. Neither is a combination of the two worn as a magic defensive cloak against downsizing spells. These are second-rate, artificial passion substitutes: Mixed into the recipe, they yield sub par results, often leaving a nasty aftertaste of burnout and dissatisfaction.

Instead I'm talking about a bona fide interest and enjoyment of the craft and challenge of software development, even outside of career or job security issues (though it benefits the same). This isn't a job ad demanding that you're "passionate about business reports!", but rather is just a moment for sober reflection on whether you're over-clocking life, or running idle instructions in a tight loop.

If you're like many software developers in the industry today, a feeling of enthusiasm and enjoyment for the pursuit is just a distant memory (often during the happy days of university and your first job). Instead it has become a career, and is just something you do from 9-5 (or more when passion is replaced by sacrifice). Skills have likely stagnated, moving just enough to compete with coworkers, or to avoid obsolescence.

Of course there are those who've never enjoyed this career, and they probably will never enjoy it -- it just isn't their thing. The only advice I can offer to those people is a suggestion that life is too fleeting to spend so much time doing something you don't enjoy.

Many others, however, remember the passion, and sporadically get a fleeting taste of it again. For those people I propose some personal habits that, coupled with workplace practices (for managers, as well as people who rightfully manage up), will help recapture and maintain that passion.

Software developers who truly love what they are doing are the ones creating the most innovative code. They're the ones with productivity rates multiples of their peers. They're the ones that feel a little guilty getting paid to do something they enjoy so much.

The Top 5 Habits of Productive, Happy Software Developers

1. Be Marketable - Keep Up To Date Skills and Network Contacts

109_0924

Most of us will work for over a dozen different firms over our careers.

We'll leave for better salaries and working conditions. We'll relocate to accommodate a spouse's career. We'll be laid off during corporate mergers and spin-offs, or even when the company goes bankrupt. We'll get turfed out because we're over-skilled, and thus overpaid, relative to the needs of the position. We'll be downsized because we aren't compatible with the new boss' empire building schemes. Maybe we'll get bored of a position and seek out something new.

This is the employment reality of most careers in the 21st century.

To some professionals this represents an exciting journey, and each transition is met with anticipation and enthusiasm. These people feel confident in their abilities, have a network of peers in the industry communicating interesting opportunities, and their skillset is up-to-date and marketable (they have the appropriate laundry list of abilities, credentials and certifications, and upgrade as needed), and while the possibility of their current employer closing shop tomorrow is something they'd prefer not happen, and they probably love the great group of people that they work with, it isn't something that they fear.

To less prepared professionals, however, the idea of losing their cushy job hangs over them like a black cloud. Their lack of apparent opportunities, and the feeling that they couldn't find an equivalent job, is enormously destructive of both motivation and job satisfaction. Paradoxically, job protectionism (such as making one "indispensable" through obscurity, by denigrating coworkers, and so on) often becomes a more likely activity of people in such positions than legitimate contributions.

This is incredibly destructive to morale, not just for the individual in question, but for everyone on their team: Often the malcontent, contagiously demotivated member of the team is the least employable, and it can be debated which condition led to the other.

SUMMARY:  No matter how much you love your current job, you should keep your CV current, and you should always keep up-to-date on industry opportunities. Know what skills are in demand, and try to gain experience in them (even if it means pursuing formal or self-training during your own time), and attain a level of comfort that you could transition to a different opportunity with minimal discomfort.

MANAGER SUMMARY: You should do everything in your power to make your group feel confident in their abilities -- ensure that everyone gets a chance with marketable technologies; encourage the pursuit of desirable certifications; and build skills through internal resources, workshops, and seminars. Unless you're running a sweatshop, this is unlikely to lead to a feared exodus of employees, but instead will empower and motivate your group to more openly contribute, and to demand more of each other.

2. Be The Master of Your Domain

The control we have over our environment can have a tremendous impact on our happiness. 

Something as simple as a sporadically malfunctioning key on our keyboard can ruin an entire day, for instance. Similarly, when you're nearing a deadline and your network connection starts flaking out, it can make an enjoyable jog to the finish line a frustrating exercise of physical restraint (in this case restraining yourself from tearing the wiring out of the wall). At least we have optical mice now, eliminating one of the primary causes of environmental control frustration.

Many times our work habits inevitably bring a feeling of "lack of control" into our work lives: By failing to fully read the documentation for our tools, investigating their behaviour, APIs, and nuances, we often create a situation where much of our development is basically crap-shoot trial and error, reacting as things don't work as planned.

I've witnessed development groups, not to mention that I've demonstrated this unsavoury trait myself, unhappily fighting with perceived technology deficiencies (usually as a deadline rapidly approaches), moaning and complaining about what seems to be faults in the language, technology, or platform, forever building workarounds under a fog of uncertainty, when in reality it was actually a fault in the understanding of the same. 

More often than not it's simply that they haven't spent the upfront time to understand the language (I remain amazed at the number of C# developers who have no idea what the using keyword is for, or why seemingly out-of-scope file objects are still locking files until some magical, indeterminate time in the future. Or the Delphi developers who needlessly nulled variables at the end of scope in a futile misguided attempt to fight mystery bugs), the technology, or the platform. Their frustration is created out of ignorance, and a small up-front investment would have sped up development, increasing the sense of control that the developers have over their domain.

SUMMARY: The next time something seems mysterious or unknown, take the time to properly investigate it. Classic lack-of-control approaches such as hacked workarounds or "reset the server daily" lead to a feeling of losing control, reducing job satisfaction and adding to the natural daily frustrations. And get your keyboard replaced if it starts malfunctioning.

MANAGER SUMMARY: Identify and investigate "easy-outs" proposed by your development team. While most software has faults, and products and technologies often work differently than we might imagine, many times such excuses are due to a lack of investigation and analysis. Even when things don't work as advertised, which is frequently the case, formally investigating and empirically determining behaviours is vastly superior to each developer endlessly fighting with and then hashing out strategies on a need basis. And make sure your developers have functioning keyboards.

3. Accommodate Your Financial Needs

I've worked in some great positions at the wrong times in my life, sapping my motivation until eventually I moved on. These positions were for great firms, with great working conditions and great coworkers and management, but it couldn't realistically adapt to accommodate my evolving financial needs. I invented dissatisfactions with the situation, turning an ideal situation into a daily torture.

After getting married and planning for our first child, for instance, the financial risk/reward that worked when I was living alone in a $600 apartment eating Ramen noodles was no longer satisfactory. Demands of owning a home, a car with infant carseats, education funds, daycare (for two children costing more than it would cost to lease two (2) BMW 750i's), and boxes and boxes of diapers, required more financial returns than I needed years before.

I moved on.

While the resulting role superficially wasn't as satisfactory, from a life perspective my mood brightened dramatically, and my day was much more enjoyable.

Of course this seems like cheap advice: Make more money! And Fast! Yet the reality is that developers often do make choices to the detriment of their financial condition, and if they go too far they will hate their job no matter how perfect it otherwise is. Working for equity of a start-up is great when you're just out of university, but it is destined for failure when you're more established.

SUMMARY: If your financials are out of balance, it will unavoidably sour your mood during the workday, making you resent your employer and your workplace. When life goals exceed the income of your position, immediately begin investigating alternatives (be it asking for a raise, looking for a more senior role in your organization, or seeking employment elsewhere). No motivational boost or cool company games room will overcome this basic life need.

MANAGEMENT SUMMARY: Be aware of the goals and needs of your group. Sometimes someone's needs grow beyond the possible return of a position, and it is important to appropriately communicate this (rather than giving vague hints of unseen raises and super-bonuses at some future point).

4. Have A Life Outside of Work

125_2505

This is a rule that works for all professions -- having accomplishments providing satisfaction outside of work will smooth the inevitable downs of our professional lives, often providing one with a much better perspective. Without this, often minor workplace failures can explode into seemingly momentous events.

These accomplishments can even be in the same domain: A professional coder by day, and an open-source coder by night, for instance.

SUMMARY: There will be periods when everything seems to go wrong in the workplace. Having the cushion of achievements outside of work can avoid it spiraling into a workplace disaster, keeping spirits up through the tough times. Often non-work experiences benefit the workplace as well, whether it's techniques learned from nighttime projects, or delicious coffee courtesy of the nighttime barista classes.

MANAGER SUMMARY: There is a world outside of work.

5. Properly Manage Expectations

Developers, as a general rule, are terrible at managing expectations: Many of us are prone to overpromising deliverables, assuring stakeholders that we'll deliver these amazing results sooner than is reasonable. I've fallen victim to this syndrome myself, and I've seen it occur rampantly across the industry.

When D-day comes we convince ourselves into believing that the users built their own unrealistic expectations, and managers forced us into untenable timelines. While often that is the case, just as frequently the developers were the origin of misinformation. 

While there is a temporary sense of satisfaction wowing users and management with an exaggerated declaration of our abilities (we've likely even convinced ourselves), as time wears on this misinformation can be enormously destructive and debilitating. With every day closer to the deadline we get a little more desperate for a silver bullet, hoping that some magic technology or component will deliver us from damnation.

It seldom works out that way.

Users are unhappy. Management is dissatisfied. Employees are demoralized and devastated.

The best option is always to manage expectations, to ensure that we can reasonably deliver promised results without heroic effort.

SUMMARY: Plan for the long term, realizing that promises that aren't delivered on will cause you great workplace unhappiness later. Manage expectations to ensure that you can satisfy your "customers" with reasonable effort, and with a reasonably high probability of success.

MANAGER SUMMARY: Never demand unrealistic deadlines, and question employees when provided with the same. Encourage your troops to be more reasonable with their promises, especially to stakeholders outside of the group, and they'll have a much greater probability of meeting external expectations, leading to increased motivation for everyone.

Conclusion

This is an amazing, expansive career full of incredible innovation and endless opportunity. Ensure that you don't diminish your enjoyment through simple mistakes, such as pigeon-holing into a position, or endlessly setting up yourself for failure.

Control your destiny.

Tagged: [], [], []

   
Tuesday, February 14 2006

I've been doing this as a somewhat regularly updated blog for just over half a year now, and the results have been extremely satisfying: I get about ~2500 direct unique visitors on an average day (increasing 2-6x when something ends up being a meme-of-the-day on sites like Reddit or Digg, and of course many read via aggregators), search engine referrals are up to 200 or so a day, and viewing the "who's on" list is a laundry list of influential corporations and locations across the globe.

It does feed my ego a little bit seeing visitors from various governments, the CIA, nuclear research labs, just about every large financial company, and visitors from every end of the globe. My numbers aren't huge, but it's a perfect composite of influential and knowledgeable readers.

The most popular entries thus far are as follows (I'm providing the static version links where possible):

Effectively Integrating Into Software Development Teams
Optimal Software Development Processes and Practices
Spelling Matters
Everyone Is Above Average - The Overpopulated Top 2%

I've tried to minimize the number of entries (outside of the personal category, though this anniversary one being an exception) to keep the noise as low as possible -- if you're using a reader it won't constantly pretend there's new content when I'm just adding a peanut gallery comment about someone else's blog -- though on the flip side that means that I've delayed various .NET and SQL entries until they're "perfect". 

Perhaps I might have to find a compromise somewhere in between.

   


About the Author
Dennis Forbes Dennis Forbes is a Toronto-based software architect. While focused primarily on the .NET and SQL Server worlds, Dennis frequently ventures outside of this comfort zone into game development and image processing. He has been published in several industry magazines, has been quoted in the Wall Street Journal and has been interviewed by NPR.

He is a vice president and lead software architect at an innovative New York City hedge fund back-office services firm.

Dennis has been working on solutions for the financial, telecommunications, and power generation markets for over 15 years.





 
Earlier EntriesLater Entries

Dennis Forbes