After some pretty rampant speculation, it turns out that the big Google/Sun announcement was much ado about nothing. Basically they announced some strategic partnerships and cross-promotional agreements, for instance the Google Toolbar being bundled with the JVM (ignore the fact that this sort of thing usually disenfranchises customers). Zzzzzzz.
The world didn't change today, and it turns out that my use of the word "baseless" a couple of days ago was entirely on the mark. Phew, I was worried that I would look pretty silly about that.
What I find remarkable, though, is the things that people thought that Google would pull out of their hat today: From an amazing AJAX-enabled web Office Suite that would eviscerate Microsoft Office, to a full-scale operating system (again to attack Microsoft...all things revolve around Microsoft). The expectations were super-sized. While Google is undoubtedly a very technically capable firm full of some extremely clever people, and they've got a brilliant business plan that allows them to put more money and resources into web apps than just about anyone, the leap from Google Maps and Google Mail to an Office competitor is colossal.
Of course we've seen this sort of ridiculous over-estimation of Microsoft's competitors before. It's been the Year That Linux Takes The Desktop for half a decade now (isn't Microsoft supposed to be dead by now?), and the great Java Office suite is always just a little ways from being viably competitive, and of course the most famous of all was Netscape: Here was a company that put together a piece of software of a complexity level comparable to Windows accessory applications, and suddenly they were perceived as the company that was capable of anything - toss together a full-scale, feature rich operating system and all supporting applications? Why not! Netscape, despite delivering a couple of fairly simplistic applications, could accomplish anything in the eyes of many. Well, actually they couldn't, and their browser codebase started to rot until they were irrelevant, with scavengers picking at the corpse. Whoops!
I root for no side in the great Google versus Microsoft fight, and ultimately I hope the competition serves the consumer very well, however there needs to be some realism in the expectations that people are setting.
A psychology study performed in 1977 - The College-Bowl Study (Ross, Amabile, and Steinmetz, 1977) - demonstrated a fascinating piece of human psychology, which was that people would over- or underestimate people's intelligence relative to their own, attributing undemonstrated qualities to them, based upon role or situation. They called this an "attribution error".
In the study subjects were split into two groups: the quizmasters and the contestants. The quizmasters were tasked with coming up with some challenging-but-not-impossible questions, which they then presented to the contestants as pairs. Invariably the contestants did poorly, as the quizmasters naturally relied upon their own unique proficiencies for material: Whether they quizzed about classical music, famous art, the Altair 8800, or chemistry, their knowledge (and thus what they thought was "challenging but not impossible") differed greatly from the contestants, as should be expected among a diverse group of people.
Amazingly, when asked afterwards about the intelligence of each other, the contestants overwhelmingly estimated the intelligence and knowledge of the quizmasters as greater than their own. Similarly, in a follow-up study an observer was added, and again the observer believed the the quizmaster was more intelligent than the contestant.
Of course this was a completely ridiculous conclusion: Not only were they randomly assigned arbitrary roles, but general knowledge tests applied across the group showed no correlation between role and intelligence. Logically it seems probable that there is a tremendous bulk of knowledge that the contestant holds that the questioner does not, but because it wasn't demonstrated it was unaccounted for. Out of "sight", out of mind.
This basic human tendency is pervasive, and it goes both ways: Some of us underestimate ourselves because we don't have the (often artificial, superficial, or temporal) domain knowledge of others ("I just saw an episode of Numbers, and boy do I feel dumb now..."), while others under-estimate people who don't share their particular grab-bag of facts ("Boy that guy is an idiot! The guy didn't even know what OPML is!". Us nerds are particularly guilty of this, discounting the incredible array of knowledge and skills that people in other fields have, instead judging them on their knowledge of Linux distros or esoteric Windows shortcut keys. This is a tremendous vice).
Take advantage of this human trait! The next time you're having a big meeting, bone up on fringe facts and edge questions for your peers. Learn some irrelevant facts about an uncommon area - for instance beetles or metallurgy - and bring it up at every opportunity. I just took advantage of this myself, talking about the fairly obvious observations of a 28-year old study. Aren't I clever.
I've taken a look a Microsoft's competitor to Google Maps several times, and each time I've been struck by the incredibly poor coverage of Canada: Not only is the search completely useless for Canadian addresses (even when you've very clearly indicated that you're looking in Canada, which is odd as they had great data on Canada in ancient MapPoint releases), the hilarity is compounded by the fact that the satellite imagery stops right at the border. Whether it's an imagery rights issue or not, it is quite contrary to the whole "Earth" thing in the product name. Maybe Microsoft Virtual United States of America on Earth is a more accurate name.
Of course Microsoft, being largely an American company, is entirely within its rights focusing on the US marketplace, just as I'm entirely within my rights to complain about it. It seems odd that something like satellite imagery has national boundaries, and it seems more likely that some product manager deep within the intestines of Microsoft decided that the hassle and storage of dealing with Canada wasn't worth the bother, and thus was it wiped from the map. A bit of a foolish decision given that there's a fair number of us, and given that it's cold most of the year we all tend to have high speed connections and predisposition to spending lots of time online.An article was mentioned on Slashdot regarding Google's (or Yahoo's) creation of some sort of web-based Office suite. While the story is actually baseless speculation (it is supported as much as the title of this entry is), it did make me pause and contemplate where Google is going next. Consider for a moment that Google's primary innovation hasn't really been new products or technologies, but rather a business model and transactional efficiency that allows them to offer unprecedented amounts of processing, storage and bandwidth for users: Google has managed to make money offering services that most thought prohibitively costly. Google has absolutely redefined the market, and continues to do so with each release.
Given that Google has become a market leader, I see no reason why Google needs to continue to be tied to DHTML. Even with the so-called AJAX, HTML is realistically too coarse for something as rich as an Office suite (don't get me wrong - I was making highly dynamic engine control systems, using "AJAX" style methods, over 5 years ago. Nonetheless it is primarily a document layout technology, and shouldn't be shoehorned into every need). It would be a waste of engineering manpower to attempt to solve that problem with the wrong technologies.
I can entirely foresee Google completely splitting from HTML for some products - Google is one company that could release new services accessible via RDP or some other streaming graphical or vector format, and it would be immediately embraced by the community. If Google didn't leverage an existing technology, but instead invented something new, they would undoubtedly release it as a standard. A real Web 2.0 would be born.
Some time back, around a year ago, I released a relatively simple command line utility - PureJPEG - to filter EXIF data (along with application data blocks, thumbnails, and so on) out of JPEGs. The utility took me an evening to throw together (it's a pretty straightforward C++ app), and was actually just a research branch of some image search algorithms I was working on - a project that I need to return to at some point.
Since I released that tool I've had literally 10s of thousands of downloads...
Nonetheless, it really filled a need: An enormous number of people were unaware of the types of EXIF data in their images, or the impact that it had on their data size (many images on the net are over 50% EXIF and thumbnail data, in cases where it is just extraneous waste). Since I released that tool I've had literally 10s of thousands of downloads (interesting note: a largely disproportionate number of the downloads are from people located in Russia - I have to guess that it piqued the interest of a Connector [terminology courtesy of the Tipping Point - great book] in Russia, and they spread it to their network. I think I'll get a Russian translation of the page put up). It is enormously satisfying as a software developer when I see something I've done has helped someone, however marginally.
In any case, on the topic of EXIF - I recently upgraded to a Canon Digital Rebel XT. I absolutely love this camera, but for whatever reason it sticks the camera serial number in the EXIF. Perhaps the serial number of a camera isn't really top secret, but nonetheless this seems like a completely needless piece of info to be sitting in every image I put online or elsewhere. It just seems like a piece of info that could be used in insurance fraud, retail deception, or some other nefarious activity. Perhaps it isn't secret, but it really isn't the sort of thing you shout from the rooftops (similar to how people and organizations obscure license plate numbers, yet they're really ridiculously un-private).
A much more profound privacy concern could come up when cameras finally start making use of the geographical coordinate points sitting largely unused in EXIF currently. These data points, storing items like latitude, longitude, and altitude, will make for absolutely brilliant geocoded photo databases once cameras start incorporating a GPS. For instance many cell phones are starting to incorporate a GPS to accommodate e911 requirements, and of course many cell phones already have onboard cameras, so it's inevitable that the technologies will collide. Imagine having the ability to search in a tool like Picasa for photos taken at a particular house, or in a particular park, without having the hassle of manually adding keywords categorizing each photo. Imagine a shared service like Flickr with brilliant locational searches.
Even better if cameras also stored the attitude and direction each photo was taken - Imagine seeing a cityscape with view cones emanating out, with colour coded focus zones (which can be determined by a variety of other EXIF data points). With a clean GPS signal, you could tell which photos were taken of someone sitting on the steps of city hall, out of the CN Tower looking towards Toronto Island, or towards the leaning tower of Pisa looking West at dusk during late August, all without relying upon haphazardly scatter-shotted user categorizing and captioning.
When this technology finally hits the mainstream - the merging of quality digital cameras and GPSs (likely through our phones) - the impact is going to be absolutely profound, and it will completely change how we archive and access our images.
[SEE FOLLOWUP - http://www.yafla.com/dforbes/2005/10/06.html#a100, and http://www.yafla.com/dforbes/2005/11/29.html#a201]
One of the justified concerns when using an int identity as your surrogate primary keys is that you'll exceed the capacity of the data type. e.g. if you accept the defaults, with your autonumbers seeded at 1 with an increment of 1, you have the capacity to store 2,147,483,647 records. While that sounds like a lot of records, and it most certainly is far beyond the lifetime size of most databases, it does have the potential of being exhausted in massive databases, or databases that see lots of rolled-back transactions (which still use up identity values). If it's a realistic possibility that you'll exceed 2 billion records, consider using one of the larger data types, such as a bigint. Avoid using the larger data types unless realistically necessary, however, as there is a storage and I/O cost that needs to be factored in.
Another potential solution is to take advantage of the negative range of the signed int. You could do this by seeding your identity values with -2147483648, incrementing from there. This will make your first record IDs less human friendly (e.g. CustomerID -214783648 instead of CustomerID 1), however it will double the identity range available, offering up 4 billion+ identity values.
You could also do this in already existing and populated tables by resetting the seed to a negative value, for instance
DBCC CHECKIDENT ('YourTableName', RESEED, -2147483648)
However this will lead to insert issues (as it'll be inserting at
the head of the data if you've cluster indexed on your primary
key), and the ident will get reset the next time you call
DBCC CHECKIDENT('YourTableName')