Some time back I posted the following on a discussion board I frequented at the time, wishing for the described feature to bless the SQL Server database engine:
Automatic relational lookup fields. e.g. Someone has a table
where they store, for instance, a form name. They store this
nvarchar(64) (say averaging 30 characters, averaging 60 bytes a
record) over and over again, redundantly. In a table with millions
of records there are only a dozen variations of the form name, and
the end result is tremendous bloat of the data, indexes, and of
course lookups are that much slower as well because of the I/O
requirements. To some database designers this is a reasonable
design because it is using a natural key (rather than an
artificially generated autonumber), but it is extremely inefficient
with space.
I would love the ability to toggle a boolean switch on the column
and SQL Server will automatically setup a hidden lookup table;
which it will automatically maintain based upon values inserted.
Concievably it could also scale the relational value (e.g. tinyint,
smallint, int, bigint) based upon the number of values in the
lookup. Of course I normally do this myself, but when you walk into
a large system that exhibits this sort of issue pervasively, it's
difficult to fix - you can hide the table behind a view, and use
some INSTEAD OF triggers to do the auto-mungification, however that
is a leaky abstraction, as Joel would say (e.g. INSERT FROM fails,
Enterprise Manager doesn't use it properly, and so on). It's such a
brainless, rudimentary task, it is one of those simple but
effective features that would be worthwhile and would improve SQL
Server.
Back to your regularly scheduled program.
Most of the replies completely missed the point, going off on tangents about how the database should be normalized better (not always an option when you have largescale, widely deployed enterprise systems. Not to mention that using large natural keys can be completely normalized, but still benefitting from this sort of improvement). Others suggested that the database engine should be as dumb as possible, doing nothing beyond storing tuples in the simplest and most obvious manner.
This came back to mind seeing a whitepaper about IBM's just released DB2v9 ("Viper"), offering optional lempel-ziv row-level compression, yielding many of the benefits that I mentioned above. This sort of compression, similar to the "compression" described above, would do wonders for a Sugar CRM database, for instance. Of course other implementations have something similar, for instance MySQL's enum field type.