Came across an old project where I had to solve this need, and thought I'd archive it on here for search engine purposes. This can be useful for scenarios like corporate training video directories, where you let the trainers upload videos and it automatically creates a thumbnail for clients to browse in a webapp, for example.
The first thing you'll need is an interop assembly to allow you to use the DirectShow COM objects from within .NET. You could either add a COM reference to your project, or better still use the command line tlbimp.exe framework utility to create the interop - the latter is preferred, as you'll want a strongly named interop assembly, which you can accomplish by specifying a keyfile with the tlbimp.exe /keyfile: parameter (specifying a key file that you created with the sn.exe framework utility). Add the newly created interop assembly as a reference in your project - barring specifying parameters otherwise, it'll be called Interop.DexterLib.dll, and once referenced will appear in the references as, of course, Interop.DexterLib.
There are a couple of structures that we need to define as we'll need them to communicate with DirectShow.
[StructLayout(LayoutKind.Sequential)]
struct element_RECT
{
public int left;
public int top;
public int right;
public int bottom;
}
[StructLayout(LayoutKind.Sequential)]
struct element_VideoHeaderInfo
{
public element_RECT rcSource;
public element_RECT rcTarget;
public UInt32 dwbitrate;
public UInt32 dwbiterrorrate;
}
The following code presumes that a couple of variables exist (for instance as parameters to a function)
In your extraction function, create a MediaDet instance, set the source video file path via the Filename property, and search for a video stream (there can be multiple streams. We search for a video stream by looking for one identified by the GUID 05589f80-c356-11ce-bf01-00aa0055595a). The following presumes that the unit has a using Interop.DexterLib; in it.
MediaDet mediaInt = new
Interop.DexterLib.MediaDet();
mediaInt.Filename = videoFilename;
bool videoStreamFound=false;
_AMMediaType oMediaType = mediaInt.StreamMediaType;
System.Guid videoHeader = new
System.Guid("05589f80-c356-11ce-bf01-00aa0055595a");
int streamCount = mediaInt.OutputStreams;
for (int counter=0;counter< mediaInt.streamCount;
counter++)
{
mediaInt.CurrentStream = counter;
oMediaType = mediaInt.StreamMediaType;
if (oMediaType.formattype == videoHeader)
{
videoStreamFound= true;
break;
}
}
Notice that we're setting the CurrentStream property on each iteration, so when a video stream is found, the MediaDet object will already have it selected as the active stream.
If we found a video stream in the file, retrieve the properties of the video frame so we can grab a full frame, and then direct it to save a frame at the specified time offset to our specified destination file.
if (videoStreamFound)
{
element_VideoHeaderInfo *header =
(element_VideoHeaderInfo
*)(oMediaType.pbFormat.ToPointer());
mediaInt.WriteBitmapBits(
videoOffset,
oHeader->rcSource.right-oHeader->rcSource.left,
oHeader->rcSource.bottom-oHeader->rcSource.top,
bitmapDestination);
Marshal.FreeCoTaskMem((System.IntPtr)oHeader);
}
Now ensure that the COM object is predictably freed now.
Marshal.FinalReleaseComObject(mediaInt);
Note that there are pointers (scary!) used above, so this code needs to be compiled with /unsafe. Don't worry - It's safe.
This technique works for pretty much any non-DRM multimedia file that contains a video stream, for which there is a DirectShow compatible codec installed on the system.
One note about this entry - I would use the PRE element and layout the code better, however Radio Userland exhibits a trait that drives me nuts with software: It is too clever, in a way that is often very detrimental. From removing attribute formatting, to completely reformatting PRE blocks, to auto-linking links that it shouldn't link, I seem to spend too much time trying to avoid it's "helpful" logic. This seems to be the case with too much software out there.
"It appears that you're writing a letter..."
Tagged: [.NET], [NET], [Software Development], [Programming], [Software-Development]
Back on September 13th I declared that SVG was a dead technology. Since then, the release of Firefox 1.5, along with the free-as-in-beer state of Opera - both featuring native SVG rendering engines - has really spurred SVG activity. I've been getting dozens of SVG related search hits here a day, and that's for an old article that I wrote back in 2002. It could be that the community finally caught onto this fantastic technology.
SVG might not be dead afterall.
Tagged: [SVG], [Programming], [Software-Development]
One of the benefits of being in the industry for a few years (I've been professionally developing and providing system consulting services for 12 years now, and of course was in the amateur ranks for a decade before that) is that you get to see history revised firsthand. This is especially true in the web app world, where the history of the platform is being rewritten by people who want to change it for their own gain, or who simply weren't involved in the industry and thus have incomplete knowledge, rewriting it purely out of ignorance.
A frequent loser in this rewriting is Microsoft: Whether it's imagining Microsoft to be a web app laggard (I was developing for the Microsoft technology stack, making web apps that blow away what people are amazed by today...6 and 7 years ago. Microsoft was a web technology superstar, but because most shops remained committed to fat apps, or wanted cross platform capabilities, few embraced their innovations), or having no influence (a lot of the current platform was either invented or implemented first by Microsoft. From IFRAMEs to most of CSS to XMLHTTP. Others like behaviors and filters died an ignoble death). While Microsoft is far from a perfect netizen, a lot of what they did has significantly and positively affected the web that we use today.
Rumor has it, and I am prone to believing it, that the web app platform was getting so powerful that the Internet Explorer team was disbanded: It was becoming capable enough that many corporations were switching many of their in-house applications to web apps, and the worry was that even with IE-only web apps, tied to IE-specific functionality, it was just a short jump to making them cross-platform (or allowing for parallel, slightly less capable cross platform options), dramatically reducing the lock-in of the Windows platform.
In any case, one Microsoft technology that is being particularly maligned is the infamous ActiveX.
Of course the term itself is a bit of a mess, and offers a classic example of Microsoft marketing gone awry (just like the disaster of naming that was .NET. If people weren't fired over that debacle, then justice wasn't served) - According to some Microsoft sources, ActiveX was a set of interfaces that could be added to a COM (Component Object Model) object to allow it to interact with the interface of an application. Generally encapsulated in .OCX files (Ole Custom Controls), these provided a replacement to the venerable VBX controls of yesteryear, providing a binary, language-neutral visual control that could be used in any ActiveX environment: Whether a Visual Basic app, a Delphi app, a MS Access form, an Excel worksheet, or a Visual C++ app, you could make use of a single ActiveX control. At one gig we needed two synchronized animated graphs showing engine performance for a tradeshow presentation - one quick Delphi ActiveX control later, and it was in the presentation (integrated right in the PowerPoint) and working great. That was the power of ActiveX.
ActiveX was also the technology behind plug-ins in Internet Explorer - Instead of begging the Netscape cabal to let them into the inner circle of Netscape plug-ins, ActiveX controls could be created by anyone and used in web pages (presuming some security hurdles were jumped, such as getting the controls signed). It was a free and open world for web extensions, and of course they proliferated by the thousands, though only a few remained when the dust settled.
Another definition is that ActiveX refers simply to COM controls themselves - if it's a COM control, then it's an ActiveX control. Another variant is that ActiveX refers to COM controls marked "Safe For Scripting".
In any case, COM was a great advance for the platform. It provided high performance, binary, language neutral, object-oriented controls that could be used throughout the system in a truly modular fashion. They could even be proxied across systems, or hosted in service modules (MTS which became component services).
Seeing the value of this powerful, extensible, system-wide technology, the Internet Explorer team decided to implement a lot of its functionality via this mechanism - So long as you configured it with the proper registry entries, and optionally implemented an interface stating its safety level, these components were usable from scripting in Internet Explorer. An obvious, and incredibly powerful, example was the use of the XMLHTTP component (a part of the MSXML library, which itself is a variety of COM controls) from within Internet Explorer. Independently both sides could be upgraded and changed, automatically benefitting the other side where desired. If you implemented visual controls, you could implement specific functionality that couldn't be handled with traditional web technologies in something like Delphi or MFC/C++, and gain all of the advantages of the web model (such as the document flow layout) alongside extremely rich controls.
It helped a lot of shops start transitioning to web applications long before the web platform could do it on its own.
The problem with ActiveX, and the main reason why it's maligned (apart from the platform lock-in), is that several controls that were marked safe for scripting were not, in fact, safe for scripting: Either they were programmed sloppily, and opened holes for buffer overflow and other nefarious activities, or they had dangerous operations that should never have been allowed from within Internet Explorer. Whatever the case, they opened holes that shouldn't have been opened.
Specific implementations gave the whole technology - a modular, high-performance and highly extensible system - a bad name. It could be said that it deserved it, given that it didn't sandbox the operations of the scripted object, but that's an implementation detail: At the core it really is a fantastic foundation.
Tagged: [Software Development], [Programming], [Software-Development], [ActiveX]
A classic, forever repeating quandary when designing applications that store large numbers of data files (images, user files, supporting documents, etc) is whether to store the files in the database, or to store them in the filesystem with pointers in the database.
Consider a web application that tracks support tickets, where users can add supporting documents to their tickets. In many implementations the files are uploaded and stored in a filesystem location with a unique name or directory (often a GUID), and that unique filename is stored in the database correlated with the record. This means that record access necessitates both file system access along with the database access.
There are significant disadvantages to this, including the lack of transactional integrity of the filesystem objects, the difficulty of management (trying to coordinate file system and database backups to be able to restore to a consistent state), the security issues, the lack of relational integrity (files could be deleted, records could be deleted without cleaning up the related files, and so on), among others.
On the flip side, the advantage of this technique is reduced load on the database server (e.g. you could offload file storage to a very large scale NAS device), as well as immediate file system access where appropriate (e.g. an administrator needs no special tools to browse the files, although this could be considered a detriment as well). Many developers find it an easier model to implement using the file system for supporting files.
For those who prefer the file system in such scenarios, the transactional integrity deficiency of this technique will be fixed in Windows Vista (formerly Longhorn) and related technologies - It is introducing a transaction-capable variant of NTFS (TxF). NTFS is already a journaled and reliable filesystem, however TxF will add the ability of the filesystem to participate in distributed transactions - both intra-machine, and inter-machine - with standard two-phase commit functionality. This means, for instance, that when the user is adding the record that includes a supporting document, the file and record could be created under a shared transaction in the middle tier (or even in the database if you use it as a conduit, storing and retrieving filesystem objects in the database logic), and if either fail they both fail (avoiding dirty data). Add this with the easy ability to add complex database logic to probe and validate the correlating file system (now that SQL Server 2005 can host .NET functionality, meaning that your trigger can more robustly check for file existance when records are created, and delete them when they are removed), and it becomes a much more credible option.
(As an aside - Distributed transactions - transactions across heterogenous resource managers - have traditionally been very, very slow. This new file-system transaction functionality most certainly isn't free, but where the reliability is critical - which is almost always given the cost and uselessness of dirty data - it can represent a great improvement. Registry changes will also be boundable in distributed transactions)
Supporting Links
MSDN Page
Channel 9
Video on TxF (Given that the long form name is the
correctly descriptive Transactional NTFS, shouldn't the
abbreviation be TxNTFS? Longer to say, but it seems superior to
me)
Tagged: [Software Development], [Programming], [Software-Development], [Vista]
It always surprizes me that large domains don't configure common mistypes of their subdomains in their DNS. For instance
w.microsoft.com
ww.microsoft.com
wwww.microsoft.com
For that website it only works as http://www.microsoft.com, or http://microsoft.com. It seems logical that they should add the common derivatives in, point them to a multi-host-header site that does nothing but redirect to http://www.microsoft.com, and voila - A lot of sloppy-typing users save a bit of time and avoid a bit of frustration.
Microsoft actually makes use of a lot of their subdomains (e.g.
research.microsoft.com, msdn.microsoft.com), however for smaller
sites it could simply be a wildcard entry. Even if there are lots
of subdomains, the redirect logic could do some analysis of the
typed in entry and figure out the likely destination. e.g.
madn.microsoft.com probably wanted msdn.microsoft.com, and
resrch.microsoft.com probably wanted research.microsoft.com.
Of course yafla.com isn't configured like that - I don't control
the DNS in this case so I didn't have the option.
User measurement and tuning of computer hardware and components is largely a lost art - we buy high performance hardware within our budget, and it is what it is. When we require more speed, or more likely just buy to keep up with technological advances, we buy something new.
This came to mind as I was evaluating why the media performance between a couple of home PCs was relatively poor: For whatever reason, streaming media files were stuttering, and photo replication was taking far too long given the quantities of data involved. While I just lived with this for a while (it is just a couple of home PCs), I finally decided to get some real metrics to know what I was dealing with. I decided to look at the numbers from the file system level (rather than measuring just the network itself to catch any high-level issues). Off I went with the fantastic application iozone (which also doubles as an excellent test utility to gauge the performance of various segregated storage systems, which will be another entry coming very soon), analyzing both the wired network and the wireless network.
To make a boring entry even more boring, it turns out that the source computer - running on an nvidia nforce motherboard - was using the nforce chipset networking adapter, with the secondary Marvell adapter disabled. All with stock settings. After running the first set of tests, seeing that the network performance over the 100Mbps network was yielding about 800KB/second of actual transfer (despite the links all reporting a 100Mbps signalling speed), I disabled that network adapter and enabled the Marvell, switching the ethernet cable over.
With that simple action, network throughput suddenly jumped to 12MB/second (the theoretical limit of 100Mbps ethernet). A 15x performance improvement just because I finally decided to measure it and do something about it. Now I think I'll play with the buffer settings to see what further benefits can be gained. Then I'll probably upgrade to 1000Mbps and start again (amazing how inexpensive 1Gbps networking equipment is now).
I always enjoy these exercises because inevitably you go down a path and learn more about a fringe - yet still important - element of our computing world. For instance some interesting details about the file cache of Windows 2000 (which was largely unchanged for Windows 2003).
A bit of an odd entry today, spurred by the general lack of awareness regarding these acronyms and their meaning: While they're undoubtedly old-hat for a database admin in a large enterprise (where they certainly play a critical role), they're less likely to be found amidst the parlance of smaller shops, or among professionals who function as hybrid developer/database architect/system architect. As many of the people who visit fall in those latter two categories, I thought it worth a quick overview. Even if you're a developer and these are all deployment concerns, you should know what the network guys are talking about when they discuss these concepts.
All three acronyms exist in the world of segregated storage systems, which in a nutshell is the requisitioning of storage capacity separate from computational requirements: Instead of calculating each server's needs as an island, as a composite of computational and storage needs, you instead pool the storage requirements and facilitate that via one of these storage technologies. Pooling brings some advantages of scale, some technological advantages, not to mention that your capacity utilization and peak performance will likely improve.
Segregated storage often allows for much greater scalability, allowing you to add disks and upgrades to the storage systems, transparently improving capacity and performance throughout the entire infrastructure (versus each system being an isolated performance unit).
So a brief overview of what each of the acronyms means, and how it applies.

Network attached storage is generally used to describe servers (or "appliances") that are requisitioned for the sole purpose of being file servers, often running a lightweight or specialized NAS-specific operating system (for instance Windows Storage Server 2003, or a specialized version of Linux). NAS systems with massive capacities, often with redundancy such as RAID (Redundant Array of Inexpensive Disks - basically the system has redundancy such that one or more of the hard drives can completely fail with no loss of data, frequently exhibiting just a decrease in performance, but with no downtime: Usually you can plug a replacement drive in - while the system is running - and it'll automatally bring the new drive online and populated, restoring performance. I'm ignoring the misnomer "RAID 0", which is actually a performance technique that offers no redundancy), can be purchased for incredibly low prices these days, many of them - including those built on Windows Storage Server 2003 - with no additional licensing fees (e.g. you can add a huge-capacity NAS device to facilitate your entire enterprise with only the cost of the box itself - no additional per-user licensing issues).
NAS systems generally support common file sharing protocols like CIFS/SMB (Windows), NFS (Unix/Linux), and so on, and usually integrate into Windows domains and Active Directory infrastructures for security purposes, so they seamlessly interoperate with your existing infrastructure. NAS is even making inroads in the home, with many alpha-geeks installing a very high capacity, high-performance NAS box for media files and centralized storage, supporting various other computing devices throughout the house.
Some NAS
resources:
Windows
Storage Server vendors
An
inexpensive, high performance NAS starting point -(the same
company makes a highly lauded
solution for the home. ~$1000 for 1TB. Here's
a good entry about that product)
Iomega NAS servers
Dell PowerVault 754N
SQL
Server 2000 I/O Configuration in a SAN/NAS Environment
Wikipedia
NAS entry
Apart from being a destination for backups, NAS can also host SQL Server databases themselves (e.g. your database server is running on server A, but the actual data is on server B, managed by server A over your high speed network), and with certified hardware (WHQL) this configuration is supported by Microsoft. To do so you just need to create or restore the database to a UNC location.
e.g.
CREATE DATABASE SampleUNCDatabase ON
( NAME = Sample_dat,
FILENAME = '\\\\mynas\\db\\sample.mdf',
SIZE = 10MB,
MAXSIZE = 2000MB,
FILEGROWTH = 10MB)
LOG ON
( NAME = Sample_log,
FILENAME = '\\\\mynas\\db\\sample.ldf',
SIZE = 5MB,
MAXSIZE = 25MB,
FILEGROWTH = 5MB)
Those of you playing along at home will have been surprized by the following error.
Msg 5110, Level 16, State 2, Line 1
The file \\\\mynas\\db\\sample.mdf is on a network path that is not
supported for database files.
Msg 1802, Level 16, State 1, Line 1
CREATE DATABASE failed. Some file names listed could not be
created. Check related errors.
By default SQL Server doesn't support hosting databases on network locations, as there are some caveats that need to be considered (namely the throughput - NAS is accessed via a generalized file sharing protocol on top of a generalized transport protocol, often over a lower speed transport, and can kill database performance). You can enable UNC hosting by enabling trace flag 1807. Just make sure your NAS is accessed over a dedicated or low-usage Gbps or better network connection.
e.g.
DBCC TRACEON(1807)
CREATE DATABASE...
Success
You can read more about this at http://support.microsoft.com/default.aspx?scid=304261. This configuration is supported with appropriate hardware (which generally means "running against an NAS that runs Windows 2003 Storage Server")
NAS can not be used in SQL Server clustering scenarios. For that you need to look at a traditional or iSCSI SAN.
As NAS is operating at a higher level, hiding the details of the underlying storage, to defragment an NAS device you would have to do it on the device itself, specific to that NAS. You could do backups on the NAS itself, though in-use files like SQL Server's data files would need agents to be backed up online.

While a NAS operates at a higher, more abstract level (the file share level, agnostic to the underlying file technologies and hiding the actual storage topology), in contrast a SAN functions at a much lower level.
SANs operate at the virtual-disk access level, using block and "physical locations" to define what to read and write, with the client devices taking a more direct role in the "layout" (at least as far as the client is concerned) of the data: Client systems are allocated blocks of SAN storage - which usually appear as a bonafide drive on the client system (with appropriate drivers) - and are connected via a dedicated 1 to 4Gbps fibre network. Generally only one client can access a logical device on a SAN at a time, however with SQL Server clustering you can point several database servers at the same logical device, and if one fails the other one takes over the device (though it is still only one at a time). The protocol on the SAN fibre network is usually SCSI.
SANs are generally very expensive, and are usually the domain of very large enterprises. As SANs operate at a much lower level, basically operating as a dumb bank of bits and blocks, these devices can become fragmented, though defragmentation would have to operate at the logical disk level, and generally needs to be performed by the PC that "owns" that logical disk. As SANs appear to the operating system as a disk - just as if it were an internal drive directly connected to the client - there are no limitations on its use beyond those that exist for a local drive.
Many SANs have a value-add in the form of snapshot functionality, where they can take an image of a logical drive and store it somewhere else (perhaps as an online whole-volume backup). While this seems trivial, they can usually do it while the volume is online and being written to, via a transaction log sort of architecture. This can be very valuable in many scenarios.
Some SAN resources:
Wikipedia entry on
SANs
Windows SAN Integration Technologies

iSCSI is basically the SCSI disk control protocol over IP (internet protocol). The benefit being that you can access a storage device over anything that can relay IP, including ethernet, wireless, or even the public internet. Much like a SAN, iSCSI is a dumb-bag-of-bits, and the client that owns a block of data is responsible for its management.
iSCSI has two real roles of interest: The target (the dumb-bag-of-bits that's listening and responding to iSCSI requests), and the initiator (the client computer, on which the virtual drive has been mounted). Initiators exist for virtually all modern operating system, and there are even targets for many operating system to allow them to operate as bags-of-bits (if you had a general purpose server with a huge array of under-utilized hard drives, and adequate network bandwidth, you could block some of that data to act as a storage drive for another server). Alternately there are dedicated network applications that act as iSCSI targets.
iSCSI is appearing in some inexpensive forms, and most iSCSI solutions fall pricewise somewhere between NAS and SANs. Like SANs, many iSCSI solutions have snapshot functionality. Also like SANs, iSCSI storage networks can be used for Windows clustering solutions (as of a service pack to come in early 2006) - for instance in SQL Server clustering.
Some iSCSI resources
Wikipedia entry on
iSCSI
Microsoft iSCSI support (including initiator)
Windows
iSCSI target
Free
Linux iSCSI target
While this wasn't intended as a complete guide to these technologies, hopefully it has given enough of an overview that there is an appreciation of what they are, and how they might fit in most enterprises.