Sunday, October 02 2005
I'm really looking forward to the beautiful colours of fall that are just on the verge of exploding here in Southern Ontario. We're usually pretty good about taking advantage of the beautiful weather and scenery, but I think we'll be extra active this season. I may have to upgrade to a "Pro" Flickr account just to have the upload bandwidth.
   
Sunday, October 02 2005

Some time back, around a year ago, I released a relatively simple command line utility - PureJPEG - to filter EXIF data (along with application data blocks, thumbnails, and so on) out of JPEGs. The utility took me an evening to throw together (it's a pretty straightforward C++ app), and was actually just a research branch of some image search algorithms I was working on - a project that I need to return to at some point.

Since I released that tool I've had literally 10s of thousands of downloads...

Nonetheless, it really filled a need: An enormous number of people were unaware of the types of EXIF data in their images, or the impact that it had on their data size (many images on the net are over 50% EXIF and thumbnail data, in cases where it is just extraneous waste). Since I released that tool I've had literally 10s of thousands of downloads (interesting note: a largely disproportionate number of the downloads are from people located in Russia - I have to guess that it piqued the interest of a Connector [terminology courtesy of the Tipping Point - great book] in Russia, and they spread it to their network. I think I'll get a Russian translation of the page put up). It is enormously satisfying as a software developer when I see something I've done has helped someone, however marginally.

Bleeding Tree

In any case, on the topic of EXIF - I recently upgraded to a Canon Digital Rebel XT. I absolutely love this camera, but for whatever reason it sticks the camera serial number in the EXIF. Perhaps the serial number of a camera isn't really top secret, but nonetheless this seems like a completely needless piece of info to be sitting in every image I put online or elsewhere. It just seems like a piece of info that could be used in insurance fraud, retail deception, or some other nefarious activity. Perhaps it isn't secret, but it really isn't the sort of thing you shout from the rooftops (similar to how people and organizations obscure license plate numbers, yet they're really ridiculously un-private).

A much more profound privacy concern could come up when cameras finally start making use of the geographical coordinate points sitting largely unused in EXIF currently. These data points, storing items like latitude, longitude, and altitude, will make for absolutely brilliant geocoded photo databases once cameras start incorporating a GPS. For instance many cell phones are starting to incorporate a GPS to accommodate e911 requirements, and of course many cell phones already have onboard cameras, so it's inevitable that the technologies will collide. Imagine having the ability to search in a tool like Picasa for photos taken at a particular house, or in a particular park, without having the hassle of manually adding keywords categorizing each photo. Imagine a shared service like Flickr with brilliant locational searches.

Even better if cameras also stored the attitude and direction each photo was taken - Imagine seeing a cityscape with view cones emanating out, with colour coded focus zones (which can be determined by a variety of other EXIF data points). With a clean GPS signal, you could tell which photos were taken of someone sitting on the steps of city hall, out of the CN Tower looking towards Toronto Island, or towards the leaning tower of Pisa looking West at dusk during late August, all without relying upon haphazardly scatter-shotted user categorizing and captioning.

When this technology finally hits the mainstream - the merging of quality digital cameras and GPSs (likely through our phones) - the impact is going to be absolutely profound, and it will completely change how we archive and access our images.

[SEE FOLLOWUP - http://www.yafla.com/dforbes/2005/10/06.html#a100, and http://www.yafla.com/dforbes/2005/11/29.html#a201]

   
Sunday, October 02 2005

One of the justified concerns when using an int identity as your surrogate primary keys is that you'll exceed the capacity of the data type. e.g. if you accept the defaults, with your autonumbers seeded at 1 with an increment of 1, you have the capacity to store 2,147,483,647 records. While that sounds like a lot of records, and it most certainly is far beyond the lifetime size of most databases, it does have the potential of being exhausted in massive databases, or databases that see lots of rolled-back transactions (which still use up identity values). If it's a realistic possibility that you'll exceed 2 billion records, consider using one of the larger data types, such as a bigint. Avoid using the larger data types unless realistically necessary, however, as there is a storage and I/O cost that needs to be factored in.

Another potential solution is to take advantage of the negative range of the signed int. You could do this by seeding your identity values with -2147483648, incrementing from there. This will make your first record IDs less human friendly (e.g. CustomerID -214783648 instead of CustomerID 1), however it will double the identity range available, offering up 4 billion+ identity values.

You could also do this in already existing and populated tables by resetting the seed to a negative value, for instance

DBCC CHECKIDENT ('YourTableName', RESEED, -2147483648)
However this will lead to insert issues (as it'll be inserting at the head of the data if you've cluster indexed on your primary key), and the ident will get reset the next time you call
DBCC CHECKIDENT('YourTableName')
 IT  SQL 
   


About the Author
Dennis Forbes Dennis Forbes is a Toronto-based software architect. While focused primarily on the .NET and SQL Server worlds, Dennis frequently ventures outside of this comfort zone into game development and image processing. He has been published in several industry magazines, has been quoted in the Wall Street Journal and has been interviewed by NPR.

He is a vice president and lead software architect at an innovative New York City hedge fund back-office services firm.

Dennis has been working on solutions for the financial, telecommunications, and power generation markets for over 15 years.





 
Earlier EntriesLater Entries

Dennis Forbes