Small yafla Logo


Index



Transactional Filesystem in Vista

Published December 06, 2005

Dennis Forbes


A classic, forever repeating quandary when designing applications that store large numbers of data files (images, user files, supporting documents, etc) is whether to store the files in the database, or to store them in the filesystem with pointers in the database.

Consider a web application that tracks support tickets, where users can add supporting documents to their tickets. In many implementations the files are uploaded and stored in a filesystem location with a unique name or directory (often a GUID), and that unique filename is stored in the database correlated with the record. This means that record access necessitates both file system access along with the database access.

109_0926

There are significant disadvantages to this, including the lack of transactional integrity of the filesystem objects, the difficulty of management (trying to coordinate file system and database backups to be able to restore to a consistent state), the security issues, the lack of relational integrity (files could be deleted, records could be deleted without cleaning up the related files, and so on), among others.

On the flip side, the advantage of this technique is reduced load on the database server (e.g. you could offload file storage to a very large scale NAS device), as well as immediate file system access where appropriate (e.g. an administrator needs no special tools to browse the files, although this could be considered a detriment as well). Many developers find it an easier model to implement using the file system for supporting files.

For those who prefer the file system in such scenarios, the transactional integrity deficiency of this technique will be fixed in Windows Vista (formerly Longhorn) and related technologies - It is introducing a transaction-capable variant of NTFS (TxF). NTFS is already a journaled and reliable filesystem, however TxF will add the ability of the filesystem to participate in distributed transactions - both intra-machine, and inter-machine - with standard two-phase commit functionality. This means, for instance, that when the user is adding the record that includes a supporting document, the file and record could be created under a shared transaction in the middle tier (or even in the database if you use it as a conduit, storing and retrieving filesystem objects in the database logic), and if either fail they both fail (avoiding dirty data). Add this with the easy ability to add complex database logic to probe and validate the correlating file system (now that SQL Server 2005 can host .NET functionality, meaning that your trigger can more robustly check for file existance when records are created, and delete them when they are removed), and it becomes a much more credible option.

(As an aside - Distributed transactions - transactions across heterogenous resource managers - have traditionally been very, very slow. This new file-system transaction functionality most certainly isn't free, but where the reliability is critical - which is almost always given the cost and uselessness of dirty data - it can represent a great improvement. Registry changes will also be boundable in distributed transactions)

Supporting Links
MSDN Page
Channel 9 Video on TxF (Given that the long form name is the correctly descriptive Transactional NTFS, shouldn't the abbreviation be TxNTFS? Longer to say, but it seems superior to me)

Add to Del.icio.usFurl It



Other Notable Postings By Dennis W. Forbes . Also see the Papers section.
What Is This?

(C) Dennis Forbes 2007