A classic, forever repeating quandary when designing applications that store large numbers of data files (images, user files, supporting documents, etc) is whether to store the files in the database, or to store them in the filesystem with pointers in the database.
Consider a web application that tracks support tickets, where users can add supporting documents to their tickets. In many implementations the files are uploaded and stored in a filesystem location with a unique name or directory (often a GUID), and that unique filename is stored in the database correlated with the record. This means that record access necessitates both file system access along with the database access.
There are significant disadvantages to this, including the lack of transactional integrity of the filesystem objects, the difficulty of management (trying to coordinate file system and database backups to be able to restore to a consistent state), the security issues, the lack of relational integrity (files could be deleted, records could be deleted without cleaning up the related files, and so on), among others.
On the flip side, the advantage of this technique is reduced load on the database server (e.g. you could offload file storage to a very large scale NAS device), as well as immediate file system access where appropriate (e.g. an administrator needs no special tools to browse the files, although this could be considered a detriment as well). Many developers find it an easier model to implement using the file system for supporting files.
For those who prefer the file system in such scenarios, the transactional integrity deficiency of this technique will be fixed in Windows Vista (formerly Longhorn) and related technologies - It is introducing a transaction-capable variant of NTFS (TxF). NTFS is already a journaled and reliable filesystem, however TxF will add the ability of the filesystem to participate in distributed transactions - both intra-machine, and inter-machine - with standard two-phase commit functionality. This means, for instance, that when the user is adding the record that includes a supporting document, the file and record could be created under a shared transaction in the middle tier (or even in the database if you use it as a conduit, storing and retrieving filesystem objects in the database logic), and if either fail they both fail (avoiding dirty data). Add this with the easy ability to add complex database logic to probe and validate the correlating file system (now that SQL Server 2005 can host .NET functionality, meaning that your trigger can more robustly check for file existance when records are created, and delete them when they are removed), and it becomes a much more credible option.
(As an aside - Distributed transactions - transactions across heterogenous resource managers - have traditionally been very, very slow. This new file-system transaction functionality most certainly isn't free, but where the reliability is critical - which is almost always given the cost and uselessness of dirty data - it can represent a great improvement. Registry changes will also be boundable in distributed transactions)
Supporting Links
MSDN Page
Channel 9
Video on TxF (Given that the long form name is the
correctly descriptive Transactional NTFS, shouldn't the
abbreviation be TxNTFS? Longer to say, but it seems superior to
me)
Tagged: [Software Development], [Programming], [Software-Development], [Vista]
It always surprizes me that large domains don't configure common mistypes of their subdomains in their DNS. For instance
w.microsoft.com
ww.microsoft.com
wwww.microsoft.com
For that website it only works as http://www.microsoft.com, or http://microsoft.com. It seems logical that they should add the common derivatives in, point them to a multi-host-header site that does nothing but redirect to http://www.microsoft.com, and voila - A lot of sloppy-typing users save a bit of time and avoid a bit of frustration.
Microsoft actually makes use of a lot of their subdomains (e.g.
research.microsoft.com, msdn.microsoft.com), however for smaller
sites it could simply be a wildcard entry. Even if there are lots
of subdomains, the redirect logic could do some analysis of the
typed in entry and figure out the likely destination. e.g.
madn.microsoft.com probably wanted msdn.microsoft.com, and
resrch.microsoft.com probably wanted research.microsoft.com.
Of course yafla.com isn't configured like that - I don't control
the DNS in this case so I didn't have the option.
A bit of an odd entry today, spurred by the general lack of awareness regarding these acronyms and their meaning: While they're undoubtedly old-hat for a database admin in a large enterprise (where they certainly play a critical role), they're less likely to be found amidst the parlance of smaller shops, or among professionals who function as hybrid developer/database architect/system architect. As many of the people who visit fall in those latter two categories, I thought it worth a quick overview. Even if you're a developer and these are all deployment concerns, you should know what the network guys are talking about when they discuss these concepts.
All three acronyms exist in the world of segregated storage systems, which in a nutshell is the requisitioning of storage capacity separate from computational requirements: Instead of calculating each server's needs as an island, as a composite of computational and storage needs, you instead pool the storage requirements and facilitate that via one of these storage technologies. Pooling brings some advantages of scale, some technological advantages, not to mention that your capacity utilization and peak performance will likely improve.
Segregated storage often allows for much greater scalability, allowing you to add disks and upgrades to the storage systems, transparently improving capacity and performance throughout the entire infrastructure (versus each system being an isolated performance unit).
So a brief overview of what each of the acronyms means, and how it applies.

Network attached storage is generally used to describe servers (or "appliances") that are requisitioned for the sole purpose of being file servers, often running a lightweight or specialized NAS-specific operating system (for instance Windows Storage Server 2003, or a specialized version of Linux). NAS systems with massive capacities, often with redundancy such as RAID (Redundant Array of Inexpensive Disks - basically the system has redundancy such that one or more of the hard drives can completely fail with no loss of data, frequently exhibiting just a decrease in performance, but with no downtime: Usually you can plug a replacement drive in - while the system is running - and it'll automatally bring the new drive online and populated, restoring performance. I'm ignoring the misnomer "RAID 0", which is actually a performance technique that offers no redundancy), can be purchased for incredibly low prices these days, many of them - including those built on Windows Storage Server 2003 - with no additional licensing fees (e.g. you can add a huge-capacity NAS device to facilitate your entire enterprise with only the cost of the box itself - no additional per-user licensing issues).
NAS systems generally support common file sharing protocols like CIFS/SMB (Windows), NFS (Unix/Linux), and so on, and usually integrate into Windows domains and Active Directory infrastructures for security purposes, so they seamlessly interoperate with your existing infrastructure. NAS is even making inroads in the home, with many alpha-geeks installing a very high capacity, high-performance NAS box for media files and centralized storage, supporting various other computing devices throughout the house.
Some NAS
resources:
Windows
Storage Server vendors
An
inexpensive, high performance NAS starting point -(the same
company makes a highly lauded
solution for the home. ~$1000 for 1TB. Here's
a good entry about that product)
Iomega NAS servers
Dell PowerVault 754N
SQL
Server 2000 I/O Configuration in a SAN/NAS Environment
Wikipedia
NAS entry
Apart from being a destination for backups, NAS can also host SQL Server databases themselves (e.g. your database server is running on server A, but the actual data is on server B, managed by server A over your high speed network), and with certified hardware (WHQL) this configuration is supported by Microsoft. To do so you just need to create or restore the database to a UNC location.
e.g.
CREATE DATABASE SampleUNCDatabase ON
( NAME = Sample_dat,
FILENAME = '\\\\mynas\\db\\sample.mdf',
SIZE = 10MB,
MAXSIZE = 2000MB,
FILEGROWTH = 10MB)
LOG ON
( NAME = Sample_log,
FILENAME = '\\\\mynas\\db\\sample.ldf',
SIZE = 5MB,
MAXSIZE = 25MB,
FILEGROWTH = 5MB)
Those of you playing along at home will have been surprized by the following error.
Msg 5110, Level 16, State 2, Line 1
The file \\\\mynas\\db\\sample.mdf is on a network path that is not
supported for database files.
Msg 1802, Level 16, State 1, Line 1
CREATE DATABASE failed. Some file names listed could not be
created. Check related errors.
By default SQL Server doesn't support hosting databases on network locations, as there are some caveats that need to be considered (namely the throughput - NAS is accessed via a generalized file sharing protocol on top of a generalized transport protocol, often over a lower speed transport, and can kill database performance). You can enable UNC hosting by enabling trace flag 1807. Just make sure your NAS is accessed over a dedicated or low-usage Gbps or better network connection.
e.g.
DBCC TRACEON(1807)
CREATE DATABASE...
Success
You can read more about this at http://support.microsoft.com/default.aspx?scid=304261. This configuration is supported with appropriate hardware (which generally means "running against an NAS that runs Windows 2003 Storage Server")
NAS can not be used in SQL Server clustering scenarios. For that you need to look at a traditional or iSCSI SAN.
As NAS is operating at a higher level, hiding the details of the underlying storage, to defragment an NAS device you would have to do it on the device itself, specific to that NAS. You could do backups on the NAS itself, though in-use files like SQL Server's data files would need agents to be backed up online.

While a NAS operates at a higher, more abstract level (the file share level, agnostic to the underlying file technologies and hiding the actual storage topology), in contrast a SAN functions at a much lower level.
SANs operate at the virtual-disk access level, using block and "physical locations" to define what to read and write, with the client devices taking a more direct role in the "layout" (at least as far as the client is concerned) of the data: Client systems are allocated blocks of SAN storage - which usually appear as a bonafide drive on the client system (with appropriate drivers) - and are connected via a dedicated 1 to 4Gbps fibre network. Generally only one client can access a logical device on a SAN at a time, however with SQL Server clustering you can point several database servers at the same logical device, and if one fails the other one takes over the device (though it is still only one at a time). The protocol on the SAN fibre network is usually SCSI.
SANs are generally very expensive, and are usually the domain of very large enterprises. As SANs operate at a much lower level, basically operating as a dumb bank of bits and blocks, these devices can become fragmented, though defragmentation would have to operate at the logical disk level, and generally needs to be performed by the PC that "owns" that logical disk. As SANs appear to the operating system as a disk - just as if it were an internal drive directly connected to the client - there are no limitations on its use beyond those that exist for a local drive.
Many SANs have a value-add in the form of snapshot functionality, where they can take an image of a logical drive and store it somewhere else (perhaps as an online whole-volume backup). While this seems trivial, they can usually do it while the volume is online and being written to, via a transaction log sort of architecture. This can be very valuable in many scenarios.
Some SAN resources:
Wikipedia entry on
SANs
Windows SAN Integration Technologies

iSCSI is basically the SCSI disk control protocol over IP (internet protocol). The benefit being that you can access a storage device over anything that can relay IP, including ethernet, wireless, or even the public internet. Much like a SAN, iSCSI is a dumb-bag-of-bits, and the client that owns a block of data is responsible for its management.
iSCSI has two real roles of interest: The target (the dumb-bag-of-bits that's listening and responding to iSCSI requests), and the initiator (the client computer, on which the virtual drive has been mounted). Initiators exist for virtually all modern operating system, and there are even targets for many operating system to allow them to operate as bags-of-bits (if you had a general purpose server with a huge array of under-utilized hard drives, and adequate network bandwidth, you could block some of that data to act as a storage drive for another server). Alternately there are dedicated network applications that act as iSCSI targets.
iSCSI is appearing in some inexpensive forms, and most iSCSI solutions fall pricewise somewhere between NAS and SANs. Like SANs, many iSCSI solutions have snapshot functionality. Also like SANs, iSCSI storage networks can be used for Windows clustering solutions (as of a service pack to come in early 2006) - for instance in SQL Server clustering.
Some iSCSI resources
Wikipedia entry on
iSCSI
Microsoft iSCSI support (including initiator)
Windows
iSCSI target
Free
Linux iSCSI target
While this wasn't intended as a complete guide to these technologies, hopefully it has given enough of an overview that there is an appreciation of what they are, and how they might fit in most enterprises.
This entry is a bit of meta-blogging - blogging about blogging. I try to avoid doing this, but blogging is an "industry" in serious need of a reality check, and while I'm hardly in a position to do so, I can at least take a nibble at the toes of it.
Firstly, to avoid the seeming hypocrisy of me criticizing blogs in a blog entry, I should answer the simple question why do I blog?
The easiest answer is that it's an on-the-record (e.g. searchable) archive of thoughts and technical explanations that I think are valuable (not to all people, but rather to some people, some of the time). Under my control and ownership, I am publishing thoughts and opinions for worldwide consumption. Not that everyone in the world - or even the tiniest percentage of it - is going to want to consume it, but it is accessible to most of the world. It beats posting thoughts on random discussion boards, to be edited or censored beyond my control, and for someone else's benefit.
Right now, for instance, with a relatively middling pagerank, Google is sending about 100 people a day from around the world here for their search terms (usually technical searches), followed far behind by a dogfight between Yahoo and MSN. I get a feeling of satisfaction knowing that I've (hopefully) helped people, whether it's implementing hierarchies in SQL Server, understanding the benefits of the new functionality of SourceSafe 2005, filter EXIF from their images or understand what value GPS will add to our digital photography, building Firefox extensions, or converting color schemes. On top of that, about 20x more get here daily via RSS readers, links, or bookmarks. More still have accessibility to these thoughts via aggregators.
Knowing that some people can get value from some of the entries is very satisfying to me. Most of the people will skim past and move on, but some will get genuine use out of it.
I also blog for reputation. To a small degree I am laying out who I am on here. I think I've demonstrated that I'm a fairly smart guy, with a lot of experience and a pragmatic perspective, and I have a given set of beliefs and perceptions. I've never attempted to pander to anyone: I've alternately offended both the pro-Linux and pro-Microsoft crowds, as well as the open-source and commercial software vendors: I am not a mouthpiece of someone's dogma, and while sometimes my opinion and perspective coincides, it doesn't indicate any alignment.
A lot of the people who I (along with associates and subcontractors) consult for (consulting on SQL Server, software architecture, Biztalk, Sharepoint, outsource software auditing, and custom software development in the Greater Toronto Area) - the real revenue generating side of yafla - view these pages, and it has been very beneficial.
Additionally (and most importantly to me) I am laying the communications groundwork for a product (Product X). I'm not into pushing vapour so I won't say anymore, but that is largely the selfish reason why I've sought PageRank and readers.
So that's why I blog. The fact that these are sequential, time-based entries is largely irrelevant (which is why I removed the time on the entries - it doesn't matter if I put this up at 1:32pm or 2:27pm) - ultimately it's just a convenient content management system with a usable delivery mechanism (RSS).
Which brings me to blogging in general. <1% of the blog world, I would say, are interface blogs - they're information conduits - a one-way interface - between a business or project (or even celebrity) to consumers. Want to know what's up with SVG in Mozilla? Just read http://weblogs.mozillazine.org/tor/. If you want to know what's up with Web Services @Apache, just visit http://ws.apache.org/blog/.
There are similar blogs for most software groups and technologies these days.
These sorts of blogs - often faceless blogs with little personality - can be tremendously valuable in keeping customers up to date on a project or product's happenings. They can even be internally beneficial in corporations as project or team façades. For instance "Project Life Admin Overhaul Project" with frequently added status updates: An on-the-record, historically-traceable, centralized location for information dissemination. This could be a Sharepoint site as easily as it could be a bonafide blog, but the purpose and value is exactly the same. The goal is usually to limit the scope to the product or project (no personal chatter about team lunch get togethers or funny cat incidents), and when something pops up the reader knows that it impacts the product.
Eventually as Project X is publicized here, a separate project blog will be created that contains nothing but product news. No pictures of my car rides, meta-blog comments, or random technical commentary.
Of the remaining 99% of blogs, a significant percentage are personal blogs that really aren't intended for anyone other than family members and close friends (and even they only visit when they're guilted into doing so. "Hey Tom...did you see my latest blog entry?").
The remainder is filled up with opinion blogs (blogs largely patronized by people who already drank the kool-aid, and they're just going there to surround themselves with like-minded far-Right or far-Left minded individuals): These blogs are vastly less influential than they are generally imagined to be, as their readership is already stuffed with the converted. The only people reading an Open Source Evangelism blog, for instance, are open source advocates. Joe VP isn't wandering in there when deciding what to base the next platform on.
I left out one rather large group, which is bloggers that blog about blogging and bloggers - the meta-bloggers. This is a very, very large group of individuals. See Robert Scoble, strangely one of the most popular bloggers out there (something which I attribute to a "first mover advantage" - Robert was associated with some of the people who basically invented the concept of blogging, so he started getting the links early. Now that he's entrenched, and virtually no one really bothers linking anymore, he remains "powerful" among the blogging about blogging community). 90% of his blog is filled with either links to random stuff that other people are saying (there is very little original content in the blog world it sadly seems. It's easier to say "someone says..." than it is to actually say yourself. For every actual piece of content posted, there are probably 100 "someone said" blog entries), or talk about blogging.
If you look at his adoring community, and follow some of the backlinks, you discover a large, incestuous network of bloggers that are blogging about blogging, and linking to each others blog entries about blog entries about blog entries that talk about blogging about blogging on blogging with blogging. You even get people warning about blog "celebrities", like Scoble, releasing too much personal information. "As public interest in blogging grows, he can only get more famous." the blogger writes.
Give me a break. Bloggers are just so full of themselves.
While Scoble is surrounded by pro-Microsoft sycophants begging for a job at Microsoft, and has an adoring community of please-link-to-me advocates (9 times out of 10 you'll find the same incestuous link between Scoble and the sites he links to. Oh boy, I sure hope Scoble links here!), most of the world just doesn't care. Scoble's appeal is extraordinarily limited, and the idea of him being a celebrity among anyone other than a core group of Microsoft groupies and blog evangelists (the latter is a declining group - it really isn't an innovation anymore. The former will exist as long as there are people desperate to work for Microsoft) is delusional.
I commented in my notes regarding Microsoft's Launch 2005 event that one of the presenters asked who in the audience read the TechNet Canada blogs, and this was just another example of bloggers getting full of themselves. Of course they didn't - the audience reaction was almost entirely negative. Why would they? Why would some corporate developer trying to fight with SQL Server to solve a deadlock issue sit reading the blogs of a Canadian Microsoft technology evangelists (basically glorified salespeople)?
The idea is ludicrous, but there was that expectation, just as there's the flawed perception that the majority of people are eagerly and anxiously consuming blogs every day. It's a complete disconnect with reality, because people are grossly over-estimating the impact of blogs.
And this is the crux of blogging - It is a domain that has some definite uses, but in some ways it's a pyramid scheme: The illusion of the rising impact of bloggers is really just the blog community eating itself - selling itself to itself - all desperately tracking each other to lazily gain content for the next entry. "So and so said....here's my off the cuff take on that".
Sort of like this entry. Chomp!
Firefox 1.5 has been released, and is available for download at http://www.mozilla.com/firefox/. While superficially it looks like nothing has changed, there are some huge improvements hidden just below the surface.
All of these are fantastic to see - Firefox really is blazing its own path now, no longer caught in the no-win situation of simply following Microsoft's lead. Of course Firefox has been better at standards conformance and nuances of CSS for some time, but that doesn't really inspire a lot of end-user adoption - it's the features that matter, and in that domain it has taken a hefty lead (including over anything I've seen with IE 7).
I've been using Opera 8 as my primary browser for several months, after a couple of years with Firefox as my mainstay. Given some of the improvements I think I'm going to switch back.
Earlier today, while perusing the meme sites to see where the groupthink arrow is pointed today, I came across links to the following highly-ranked (at least by anonymous numerics) page.
http://microformats.org/wiki/rest/ahah
I checked the calendar to see if it was April 1st, but alas it does not appear to be. This actually appears to be serious.
This is where the AJAX-trend has brought us - people who have contributed nothing to the global knowledge pool are rushing to remora off of the creations of others and claim it as their own. Every obvious potential use for a programmatic element can become a cheap acronym that someone can append their name to, desperately hoping that they earn some fame for their heroic act of sitting on the sidelines and naming things years after they've entered common use. The fact that the linked page uses the term "discovered" to describe the "discovery" of the most obvious and prevalent use of the XMLHTTPRequest (and friends) object is mind-boggling.
I took a few moments today and rolled out some improvements to yaflaColor.
http://www.yafla.com/dforbes/yaflaColor/ColorRGBHSL.aspx
Again, I have to add the standard disclaimer I add everytime I mention this: It is a very simple little tool that I created primarily to scratch my own itch, however hopefully it's useful to someone else.
BTW: Why did I "publish" this tool? PageRank. I've gotten a lot of inbound links to it from people who appreciate the usefulness and ease of use, and those inbound links help my pagerank cause. So if you like it and enjoy it, I'd appreciate if you linked it. Thanks!