Dennis Forbes on Pragmatic Software Development
Subscribe to RSS
 
Monday, March 06 2006

One of the continuing trends of the Web 2.0 revolution is tag-mania -- sticking tags on everything and anything, hoping that it somehow improves the flow, digestion, and utility of information. From adding tag clouds to your blog, to slashdot, to photos, to bookmarks, tags have continued to spread across the web landscape.

Burlington Skyway

As with every tech "revolution", in corporations across the globe eager employees are embracing the trend, advocating adding tags to documents and directories and files, and embracing the concept of metadata.

As a bit of an explanation for those who haven't been following TechCrunch in morbid curiousity -- wondering what dubious business came out of super-secret stealth alpha invite-only mode today -- and thus aren't up on their Web 2.0 lingo, tags are, in essence, a set of words that one or more users apply to something to categorize it -- what we historically called keywords, albeit sometimes (thought not always) with a "democratic" process determining the rendered tag set.

For instance the tags of this post might be "Web 2.0, tags". Ten visitors might add "tripe", making it the dominant tag in the tag cloud.

Getting a variety of people adding tags to the same content, or building a common directory of information loosely categorized by tags, is what's commonly called a folksonomy. Consider, for comparison, a formal taxonomy of a system like Yahoo's classic categorization, where a submitter would choose exactly where in the hierarchy a link went, and the Yahoo overlords would validate it, and insert it if appropriate. Instead the loose addition of tags adapts to have multiple categorizations over time.

[Web 2.0 aware readers will probably shudder seeing an explanation of something so "basic", yet discussions in the field have led to me to believe that much of this great revolution has gone unnoticed by the bulk of society, including even the majority of technology workers. I regularly converse with people who've never seen del.icious, don't know who 37signals are, and haven't been to Reddit or Digg or Flickr or Furl. Much like bloggers have grossly overestimated the impact of blogs on the general population, there seems to be a presumption that the Web 2.0 lingo and dogma is more universal than it actually is]

While many of the Web 2.0 aficionados declare there to be a fundamental religious difference between the venerable keyword and tags, the difference is superficial at best (democratically selected keywords are still just keywords). The same keywords that have always existed as a data block in the JPEG file format, and exists in virtual every document format (Word, for instance), form the foundation of tags. Metadata has been around since we first started storing data, and tags are a continuation of that trend.

Many of the foundations of modern tagging, the evolution of the keyword, were first demonstrated widely by the superlative web photo organizing and sharing application Flickr.

Given the primitive state of image recognition, this was a perfect fit: Without tagging your photo with keywords such as "bridge, burlington skyway, qew", there was no way searches could find that photo if asked, for instance, for pictures of the Burlington Skyway bridge -- We aren't yet at a stage where software can reliable figure out what the subjects of a picture are, and mechanical metadata is still incomplete (although it's getting there), so keywords/tags/folksonomies fills a critical gap if the photography data process.

Outside of photos the use of tags is often much more dubious.

To go back in history a bit, when search engines first appeared they largely relied upon meta keywords. This was a compromise due to limits in the "comprehension" of content -- search engines got confused easily, and even when they could parse the content properly they couldn't truly figure out what the content was about

Keywords came along, offering a simple, condensed, human-created subset of the data, categorizing the important attributes of the content. Search engines embraced and utilized keywords as an important element of fulfilling search requests.

The honeymoon didn't last for long. It turned out that keywords were a prime stomping ground for search engine spammers, not to mention that it was a horribly limited method of searching through data: Not only were the choices of keywords entirely subjective -- often grossly incomplete and inconsistent -- but by design it was limited to a very, very small subset of the content. If you really wanted content about metal railings, you might have missed my extensive discussion on that topic in my Burlington Skyway Bridge article because I didn't feel that metal railings made the cut for the keywords.

Meta tags are largely dead now.

Lake Ontario

In its place search engines have become much better at determining what a given page is about (or at least simulating a reasonable promixity thereof). By analyzing content, having a directory of similar and derivative words, and by deriving information by context (such as links and related pages, and how they word links) and layout (noting that heading text, title, and early text holds more importance in classifying the page, though it still is used in concert with the rest of the content), search engines have come a long way it understanding content, and in correlating searches with appropriate results.

The loss of the keyword has proven to be very beneficial for search. Now it's the actual data that classifies the content, rather than artificial metadata.

With improvements in language processors and context associative correlations (e.g. where the content parser understands that the paragraph on boxers is talking about the boxer breed of dog, determined by its correlation with other documents coupled with other details of the language, using language trees to classify probable meaning), things will only get better.

Content search has a very bright present, and a brighter future.

Yet tags continue to spread in woefully inappropriate domains, even where it's serving as nothing more than the modern day equivalent of the venerable META keyword. Instead of building reliable, feature-rich search tools into product, appropriately determining relationships and context to understant content, product vendors are just tossing in a hack-job tag infrastructure and calling their job complete.

Worse still, users are accepting it and calling it a feature.

  Blogging 

Reader Comments

I think the fundamental difference between the keywords of old and the current tags is that tagging has a social aspect to it. Del.icio.us for example has many people tagging the same urls with different tags and each tag acting as kind of a vote on that topic. This lets the developer easily put those urls in buckets of most popular tags, confident that that url belongs there because lots of people say so, as opposed to one person categorizing that url where only he thinks it belongs.

Also, tagging isn't a way to improve search, it's a way to improve discoverablility, or re-discoverability. When you do a Google search, you usually know exactly what you're looking for. When you're looking through a tag cloud, you're kind of just browsing, looking for something that catches your eye.

Obviously tagging isn't appropriate for every application, but there is a defference with tags and keywords.
Carlo @ 3/7/2006 4:10:09 AM
Hi there Carlos!

"Obviously tagging isn't appropriate for every application, but there is a defference with tags and keywords."

Indeed, I tried to cover the social/democratic aspect of tags.

That isn't always the primary use, however. The overwhelming majority of photos on Flickr, for instance, are only tagged by the creator, and there are only ever one set of tags (not a democratic confluence of tags). Del.icio.us uses the democratic tag model, but it does so because of a technological gap -- that it knows nothing about the linked URL, so it relies upon a prehistoric and unreliable user categorization. Other recent tagging products are simply renamed keywords (see the Wiki product).

Regarding search, the primary use of tags on Flickr is to facilitate search, as is it on Del.icio.us. While users do of course browse around based upon tags, that functionality could be better handled by automatically and mechanically extrapolating subjects from text.

Tags are basically sweatshops of content interpretation, as a short-term solution/alternative to much better mechanical analysis techniques.
Dennis Forbes @ 3/7/2006 6:55:44 AM
I'd have to agree that tags are a little over used. I think though that it will be a long time before search engines can infer the connotative implications of an article. For instance, a search engine would have a tough time with satire or inferring the logical implications of an article. If I understand how google works correctly, it also relies on a sort of tag system. Based on the key words people use when linking to an article, it can guess the more abstract uses of information. A great example is when enough people linked 'miserable failure' to the white house web page. I think tags are just an explicit form of the very information (emotions, opinions, connotative meaning) search engines need to determine relationships and social context. Ultimately though I think your last paragraph is correct, one can't rely solely on tags.
Craig @ 3/7/2006 8:25:02 AM
When the Boardgamegeek began accepting tags for Geeklists (lists of games, along with commentary), one of the less popular users posted a list hyping one of the games he designed.

Within minutes, his list was tagged with such useful tags as "You_Cant_Talk_To_A_PSYCHO_Like_A_Normal_Human_Being" "Shill" "Losershill" "Tiny_Unit", etc.

The admins wisely removed his list shortly thereafter...
Todd Derscheid @ 3/7/2006 7:05:35 PM
I must admit that I like tag clouds, when used on content that is well tagged they show a pleasingly democratic exposition of the tags.

They also allow usres to write their own opinion about something into the tags, as the above example shows. That's not necessarily a bad idea, provided that sort of unintended use is acceptable to the application owner.

It is much better than categorisation schemes that rely upon (often) poorly understood tree hierarchies. Jakob Nielsen points out that some users search, whilst others browse. Creating machine-tagged data is one way to automate the browsing experience, but runs into all the usual problems of it not understanding context (e.g. April 1st blog entries may need to be tagged 'humour', even if nothing in the entry itself suggests it is humour). Allowing users to tag data objects themselves is an efficient and useful way to do it as well.
Angus McDonald @ 4/4/2006 8:38:19 PM

Add Comment

Name *:

Email Address:

(your email address is not displayed)
Website:

Comment *:


Dennis Forbes - Dennis Forbes is a Toronto-based software architect and technology writer