
When Java first hit the development ecosystem, to many it wasn't just a method of doing efficient, high-level development, but rather it became a new religion: You couldn't only use Java as the glue between existing code, or even as the overwhelming bulk of your solution. A partial-Java solution simply wasn't good enough.
Instead your product had to be 100% Pure Java. The still sought-after eventual goal was a complete Java solution, from applications right down to the operating system, with only the smallest possible binary kernel, if even that. All of this would be running on a Java-aware processor, engineered specifically for Java.
Sun created a "100% Pure Java" campaign to push this philosophy, including banners and designations for appropriately certified software, and advocated it as a very desired moniker. Users were led to feel that mixed solutions were impure and somehow dirty: Are you some sort of nut running an impure solution, dirtied with some pointer munging, buffer overflow vulnerable C code? While there were (and remain) methods to call native code, they were discouraged.
Of course there is a lot of validity to this agenda. Primary being the fact that pure Java solutions are theoretically cross-platform, with no ties to external technologies. Compare this to a solution leveraging C libraries, which would require a rebuild or binary available for every distinct target platform. Additionally Java could only impose its sandbox and extensive security constraints if you stayed in the world of Java, and thus callouts to native code represented a risk.
In the real world, though, it often meant that developers were constantly solving long-conquered problems, redundantly reinventing solutions in Java that long existed elsewhere, or waiting until adequate libraries eventually appeared: Developers were pressured to use Java alone even when it was a hammer and the solution really needed a chisel.
Thankfully .NET hasn't been pushed in such a single-minded way (even if some of its champions have foolishly taken up such a misled cause, including some at Microsoft. Instead of a justified part of the solution, it becomes a religion. .NET! .NET! .NET! .NET!), and indeed Microsoft themselves has always facilitated, and even advocated, "impure" solutions. The majority of the .NET Framework, for example, is actually a very thin veneer over the existing Win32 facilities and libraries -- it was either that, or version 1.0 would have come with a much smaller, much less efficient library.
The "orchestration layer over native code" implementation is the reason .NET hasn't suffered the performance difficulties that Java has.
Microsoft chose to leverage what
they'd already done, to maximize both performance, and to maximize
the breadth of the library.
This advantage isn't limited to Microsoft, though, and the developer can utilitize this functionality as well. .NET offers very simple COM and P/Invoke functionality to leverage "legacy" code (or even new code developed in a best-solution, non-.NET technology), allowing you to easily use your existing DLLs and/or COM libraries as first class partners in your .NET solutions. Even if they're created in "dirty" languages.
I take advantage of this functionality regularly, utilizing existing best-solution libraries and functions, regardless of whether they're pure .NET or not. For instance in creating the static version of the "best of" blog entries, I quickly -- maybe 2 hours -- wrote a quick transformation tool that basically imported the "best of" RSS feed (it isn't included in the normal category lists), then doing some XSL transformations (using extension objects in the XSL given that XSLT alone wasn't adequate for some special purposes -- for instance HTMLDecoding the description block of the RSS) to the resulting XHTML, as well as creating an index page.
One goal when creating this solution is that the resulting pages are all fully XHTML compliant, and they pass the W3C validity checks. While I could easily see how the pages rendered in Mozilla/Firefox/IE/Opera, and of course they all rendered fine, technically there were a couple of deviations from the spec. Some of these errors and warnings were caused by unavoidable transformation issues, while others were caused by minor mark-up errors in the original blog entries (both because of my own errors when doing it by hand, but also because of Radio Userland's "helpful" auto-"cleanup" of HTML. It is remarkable how often auto-formatting is detrimental).
HTML Tidy to the rescue.
I had several options with HTML Tidy, the easiest of which would be to ShellExecute out to the EXE, telling it to process an existing file. I could have taken more time and tried to make a managed C++ version of Tidy, but I really didn't want to spend that much time.
I decided to have a bit more fun, not to mention building a more integrated, higher performance solution, and use the Tidy dll from the micro-.NET utility. I grabbed the Tidy source code (Tortoise CVS is a great solution for this, in this case using :pserver:anonymous@cvs.sourceforge.net:/cvsroot/tidy), updated the included MSVC projects to Visual Studio 2005, and added them to the transformation utility solution. I set the Tidy dll project output to the build directory of my .NET utility (in this case $(SolutionDir)\blogStatic\bin\$(ConfigurationName)). The MSVC build worked perfectly right away, which is amazing given that Win32 isn't an officially supported build.
To reference the Tidy dll methods, of course I had to add the DLL import signatures, in this case adding only the ones I had a need for.
[StructLayout(LayoutKind.Sequential)]
struct TidyBuffer
{
public IntPtr
bp;
/**< Pointer to bytes */
public uint
size; /**< #
bytes currently in use */
public uint allocated; /**<
# bytes allocated */
public uint
next; /**<
Offset of current input position */
};
class FileClean
{
[DllImport("libtidy.dll")]
public static extern IntPtr
tidyCreate();
[DllImport("libtidy.dll")]
public static extern int tidyParseFile(IntPtr
tidyPointer, [MarshalAs(UnmanagedType.LPStr)]string
fileName);
[DllImport("libtidy.dll")]
public static extern int tidyParseBuffer(IntPtr
tidyPointer, ref TidyBuffer tidyBuffer);
[DllImport("libtidy.dll")]
public static extern int
tidyCleanAndRepair(IntPtr tidyPointer);
[DllImport("libtidy.dll")]
public static extern int tidySaveFile(IntPtr
tidyPointer, [MarshalAs(UnmanagedType.LPStr)]string
outFileName);
[DllImport("libtidy.dll")]
public static extern int tidyRelease(IntPtr
tidyPointer);
[DllImport("libtidy.dll")]
public static extern int
tidySetCharEncoding(IntPtr tidyPointer,
[MarshalAs(UnmanagedType.LPStr)]string encoding);
[DllImport("libtidy.dll")]
public static extern int tidyOptSetBool(IntPtr
tidyPointer, int value, int Bool);
public static
bool CleanFile(System.String outputfileName, System.IO.MemoryStream
docDataStream)
{
int result = -1;
IntPtr tidyPointer = tidyCreate();
try
{
// We want the resulting
file to be UTF8 encoded
tidySetCharEncoding(tidyPointer, "utf8");
byte[] docDataArray = docDataStream.ToArray();
TidyBuffer tidyBuffer;
tidyBuffer.size =
(uint)docDataArray.Length;
tidyBuffer.allocated =
(uint)docDataArray.Length;
tidyBuffer.next =
0;
GCHandle pinHandle = GCHandle.Alloc(docDataArray,
GCHandleType.Pinned);
try
{
tidyBuffer.bp =
Marshal.UnsafeAddrOfPinnedArrayElement(docDataArray, 0);
if (tidyParseBuffer(tidyPointer, ref tidyBuffer) >= 0)
{
tidyOptSetBool(tidyPointer, 29, 1);
tidyOptSetBool(tidyPointer, 23, 1);
if (tidyCleanAndRepair(tidyPointer) >= 0)
{
result = tidySaveFile(tidyPointer, outputfileName);
}
}
}
finally
{
pinHandle.Free();
}
}
finally
{
tidyRelease(tidyPointer);
}
return (result == 0);
}
}
Most of this should be self-evident, however the two tidyOptSetBool calls may be a little cryptic. For the sake of brevity I haven't used the constants, but 29 is the TidyMakeClean value of TidyOptionId enum (see tidyenum.h), and 23 is the TidyXhtmlOut value. Together these indicate that I want to clean the documenting, converting it to XHTML. Note that I've also set the encoding to UTF8.
Voila, after transforming the RSS to the memory stream as quasi-conformant HTML, I passed the stream to this function, along with the desired output filename, and out went a cleaned-up, valid XHTML document. Pedants everywhere were thwarted from pointing out minor deviances from the standard. I could have processed to another buffer, and then done follow-up processing in .NET as well, but this was sufficient.
This is a trivial example, but it really exemplifies the great value of the easy interoperation of .NET. With it I could instantly leverage existing code, without having to search out bastardized ported versions, and instead could go right to the source.