Seth talks about the manipulation and gaming of blogs and meme sites, which I mention because the same topic -- opinion swarming & vote stuff, is one that I've written about previously, and I remain very interested in observing it in action.
I started blogging on September 4th of last year.
I had an internet
presence prior to that (content that received several Slashdot
mentions, along with a half-decent number of inbound links),
but I didn't put up content with the regularity of a blog -- largely as a
function of the hassle involved -- and I didn't
have RSS, Atom, or any other feed technology (and thus wasn't
aggregated into other feeds).
It was just a random hodge-podge of random pages prior to putting it into the structured form you see today.
All-in-all the past year has been a very, very rewarding experience: A very credible number of people visit, and from a search-engine perspective the results have been extraordinarily successful. Strange seeing several dozen people a day from my hometown coming by just because I happened to mention it in a blog entry.
To quote from a September 4th entry-
The question I am pondering, then, is whether the only way one can remain internet credible (in search engine terms) is to integrate heavily within the blogging community, quid-pro-quoing endless links and trackbacks, ingratiating oneself with other bloggers, posting meaningless comments about every posting every other blogger makes (which they will of course do in turn). It's a sort of super-pyramid scheme, but with no bottom level.
Thankfully I've never had to quid-pro-quo or ingratiate to maintain PageRank. In fact I think I've maintained a fairly antagonistic approach to many of the popular blogs and bloggers, and I've seldom resorted to inventing "material" out of mentioning other blogs.
Which brings up an interesting topic - I was chatting with a
peer about blogging and the effort/reward ratio, and they asked if
I felt that I had "succeeded" in this venture: Sure, they
pondered, I'd gotten a lot of mentions, along with a couple of
heavily visited
pages, but overall I still sit quite low on the list
of bloggers. My Alexa rankings
stink (though I should mention that Alexa rankings are
laughably useless outside of the top internet sites.
Alexa ratings are culled from users utilizing the Alexa or A9
toolbars, which is a vanishingly small number of users, clustered
into certain demographics. Just a couple of users occasionally
visiting with the toolbar has an absurdly large impact, so if I
wanted to shoot up in the rankings, I'd just recommend the toolbar
every month. As a case-in-point, at one point I noticed that my
Alexa ranking had jumped considerably, but became suspicious that a
disproportionate number of visitors visited the webstats
page...which of course only I visit. I realized it was me that was
inadvertently impacting the rankings when I had installed the A9
toolbar, so I removed it), and I'm not even among the top 1000
bloggers (by one metric I'm #5,269).
I have something like 118 bloglines subscribers, versus say 21,000 for someone like JoelOnSoftware (bloglines is only one of many aggregators, and Joel has far more subscribers overall, but it's a metric that is meaningful in a relative sense).
Yet I am thankful for every single reader, and the success of this blog is worlds beyond what I imagined. More important than quantity is quality, and some of the feedback leads me to believe that a great group of people have decided to drop by every now and then (even though many don't use feed readers, and just added it to their bookmarks for a once-a-month browse. That's the same technique I use for most blogs). Sure, complimenting your readers is a suspect activity, and is often driven by egotism above all else, but I really mean it: I couldn't have asked for a better readership.
And perhaps this will come off as cheap or like sour-grapes (which it most certainly isn't -- I set out expecting an occasionally accidental search visitor, and never anticipated the success this has seen), but there are some pretty easy ways I could have modified the message a bit to build and maintain a much greater blog presence, but that wasn't my goal.
I could...
None of these techniques are secrets, but they're only acceptable modus operandi if your primary goal is, well, blogging. That isn't my primary goal by a long shot, and I have no ambitions of becoming a professional blogger. Instead I'm motived to talk to, and hopefully influence -- and maybe even impress -- intelligent and influential people.
In the coming few months (or more correctly weeks) I have several very, very exciting things that are going to come out, including the most exciting and innovative web idea I've ever had. It's only going to get better.
But I'll never compromise the message, and I'll never let metrics and stats give me misdirected motivation.
Last Wednesday I was mentioned in the Wall Street Journal (right there on the front of the second section of one of the world's most prestigious newspapers), being referred to as the "world's pre-eminent domainologist" (an article that has been referenced in countless other sources now, including some errant attributions, such as the Toronto Star -- my hometown paper -- seemingly making me a Verisign employee, which of course I'm not).
Apparently -- or so my wife tells me, given that I don't listen to or read anything that involves me in any way, and even when she talks about this stuff I cover my ears and basically repeated "LaLaLa"s to drown it out -- it was a well-written, humorous piece. While I apparently played the part of a fringe, bit-player, my name does appear quite early in the article, and that's pretty neat to me.
The mention doesn't bring me monetary rewards, and it really doesn't contribute to my professional success in any measurable way (though it's very neat being mentioned, and it was a hugely fun process working with Lee to get the raw material and provide some basic quotes, I'm not really in the business of domain names, and it isn't really a hobby of mine -- being attributed in such a way isn't really something I really want to leverage), but it is yet another weird, discordant mention in mainstream media.
So long as it isn't the notorious sort of mention, somehow it all works into my grand plan of world domination. Bwahahahaha! <rubs eyebrows>
It all began with a couple of emails from Lee. He indicated that he was from the WSJ, and was interested in talking with me about an article he was considering. After some difficulty finding a common point of availability, we finally chatted in person. This was around Wednesday of the week before.
That evening Lee recorded an initial phone interview, indicating that he had come across my article from back in March, and knew that it had seen a lot of success (for those who didn't see it, it was an article that took off like wildfire across the net, seeing front page action on Digg, Reddit, and mentions from numerous `A-list' bloggers. Quite a few of the entries on here have seen wide "link-love", but the domain name entries absolutely blew all prior -- and following -- records away, seeing close to 100,000 visitors a day for a period of time, still maintaining a lot of incoming interest).
Given that he hadn't come across similar research (he did ask if I knew anyone else doing similar research, perhaps probing to see if I was just a sub-eminent domainologist, and perhaps I would defer to a great authority), he decided to base his article on information I provided, both in the initial article and numerous follow-up queries he asked me to run.
One particularly exhaustive query took around 20 hours of runtime.
All in all it was a lot of fun, and from my end was nothing more than a couple of very brief phone interviews, and then some randomly kicked off queries and emailed results.
Despite the fact that the article in question provides limited personally identifying information (and while it's accurate, it is a bit misleading for some. For instance I'm not in New York City -- I'm actually here in a suburb of Toronto -- and the article of course apparently doesn't mention this blog), the immediate effect of the article was dozens of phone calls from people across the US -- and the world -- asking for my opinions on business ideas, asking if domain names people held were good ones, asking if I was interested in partnering on some project or other, asking how to get access to the raw data (see the comments in the main entry -- there's a link to the fax forms), and asking how I ended up being referenced in a WSJ article.
This blog also saw a lot of activity because of the article, with a number of people coming here after searching up obvious terms like "Dennis Forbes domain name". I'm still seeing WSJ-related search activity today (maybe hermits are just adding the issue to their apartment newspaper mountains).
I've received requests for radio interviews (I've done a couple of those before, and it isn't my favourite genre: I'm too full of self-doubt when it comes to accuracy, and mortally fear the possibility of saying something incorrect in response to an adhoc question. In such an instance I'd rather say nothing until I can verify, with certainty, that what I'm saying is correct. I haven't been "blessed" with the arrogance and confidence that allows some to make the most absurd of proclamations with zero self-doubt or hesitation), and have gotten requests for, and responded to, several email interviews.
All in all a very entertaining process, and it was interesting to take part in it. It has me looking for my next angle for media exposure.
Of course Lee was being facetious when he assigned me with this title, and really I found it gut-busting hilarious when I heard it myself.
The original domain name article actually came about because I needed a medium-sized database to demonstrate high-performance database operations. While I was indeed curious about domain names, ultimately I requested access purely to have a large set of data to demonstrate some index-backed operations. I was shocked when I discovered that one could actually acquire a copy of the zone file.
I really haven't been poring over zone files for years, amazingly reading trends and consistencies from streams of raw data.
After receiving the data, I saw that it really was interesting and entertaining, so in a single night I threw together the original article: Right after getting my credentials from Verisign, I downloaded the 850MB compressed file, extracted, imported and cleansed it, and then ran some humorous queries to see if it yielded interesting results. Seeing some of the answers, I thought it would be good blog fodder so I tossed an article together and put it online.
Over the next week I only had a free moment here and there, so I belatedly put up a follow-up article, in my haste skipping many of the tests that I had promised (for instance the English language queries, which I only finally finished at Lee's request).
My interest was short-term, and my technique was mostly driven by the biggest bang-for-the-buck queries that would yield interesting blog material, while allowing me to save my time for my family and my profession (in that order). I wasn't really sitting there month after month anxiously watching streaming domain name data, inferring complex patterns like a savant. Instead it was a couple of low-hanging fruit queries against the imported dataset, writing up the results when it was unexpected or entertaining.
Of course then the material seemed to be exhausted (the follow-up article saw much less attention), and my personal curiousity waned. The database then sat and collected bitular rust.
It's a marvel that it didn't get deleted to free up room for my prime-number database, or my ridiculously expanding set of digital pictures.
Then Lee called, I fired up the database -- to my surprize I still had it -- and the rest is history.
Populism has seldom been a goal of these entries, but a couple of entries, not to mention observations of the meme sites, have given me some insights into what are some elements that increase the probability of an entry taking off. Let's just say that I'm the World's Pre-Eminent Meme Site Popularity Assessment Expert (WPEMSPA, aka Wimpy-Sumpa).
That's about it. Create entries covering everyday topics, and populate it with easy to digest graphics, and summaries that give cursory linkubators comfort that they're linking something interesting.
Enjoy the endless incoming traffic!
For close to two months now, I've been rather negligent of this blog. The reasons are numerous, however the following is a list of the primary causes.
Thanks for reading along, and have a fantastic day and week ahead!
Dennis
Before I go about possibly reinventing the wheel, I thought it worthwhile to ask: Could anyone point me to .NET / Windows server modules for SXIP 2.0 and/or OpenID? They're both fairly trivial identity solutions, so if I can't find one I'll implement one or both. Not only for personal needs, but because I can see some uses for them in client projects.
Thank you kindly.
If you believe the Alexa graph, Digg is a significantly more popular site than Reddit, with the gap growing larger every day.
I find these results surprizing. I've had half-a-dozen entries on the front page of Reddit, with each yielding from 2500-5600 distinct referrals per day. In comparison, I've had a front page entry on Digg for a day, only bringing in ~750 referrals.
Of course, a single entry isn't a very good sample, and it's entirely possible that most people just weren't interested in the link -- that it was a fluke that it got on the front page in the first place -- but I've seen several other stat watchers mention very similar stats (that a front-page of Digg yielded them 800 or so visitors per day), so I'm not basing this comment simply on my own observation. I've looked for exceptions to this, and found one individual who had a broad-interest link atop Digg's front page for 24 hours straight, and they claim that for that day they received a total of 7200 distinct referrals, and then it rapidly tailed off, disappearing in two days. That case seems to be the exception.
One possibility is that Digg offers more link diversity and thus the much greater traffic is dispersed, significantly reducing the impact on any one link. Alternately, perhaps Digg users spend more of their time within the Digg community, rather than following the links (in the same way that many Slashdot readers just make assumptions about the linked article, responding accordingly, rather than RTFA).
Another possibility is that Digg caters to a crowd that is more likely to have the Alexa/A9 toolbars installed, both of which feed back the stats that are used to drive the Alexa popularity metrics. Given that they're somewhat infrequently used toolbars, and are much more likely among certain crowds (and seems to appear in clusters), the traffic rankings are a bit of a crapshoot outside of the top sites -- Here on yafla I've had days with 6000 visitors where my Alexa ranking doesn't budge, whereas other days 2000 visitors cause it to quintuple.
One of the continuing trends of the Web 2.0 revolution is tag-mania -- sticking tags on everything and anything, hoping that it somehow improves the flow, digestion, and utility of information. From adding tag clouds to your blog, to slashdot, to photos, to bookmarks, tags have continued to spread across the web landscape.
As with every tech "revolution", in corporations across the globe eager employees are embracing the trend, advocating adding tags to documents and directories and files, and embracing the concept of metadata.
As a bit of an explanation for those who haven't been following TechCrunch in morbid curiousity -- wondering what dubious business came out of super-secret stealth alpha invite-only mode today -- and thus aren't up on their Web 2.0 lingo, tags are, in essence, a set of words that one or more users apply to something to categorize it -- what we historically called keywords, albeit sometimes (thought not always) with a "democratic" process determining the rendered tag set.
For instance the tags of this post might be "Web 2.0, tags". Ten visitors might add "tripe", making it the dominant tag in the tag cloud.
Getting a variety of people adding tags to the same content, or building a common directory of information loosely categorized by tags, is what's commonly called a folksonomy. Consider, for comparison, a formal taxonomy of a system like Yahoo's classic categorization, where a submitter would choose exactly where in the hierarchy a link went, and the Yahoo overlords would validate it, and insert it if appropriate. Instead the loose addition of tags adapts to have multiple categorizations over time.
[Web 2.0 aware readers will probably shudder seeing an explanation of something so "basic", yet discussions in the field have led to me to believe that much of this great revolution has gone unnoticed by the bulk of society, including even the majority of technology workers. I regularly converse with people who've never seen del.icious, don't know who 37signals are, and haven't been to Reddit or Digg or Flickr or Furl. Much like bloggers have grossly overestimated the impact of blogs on the general population, there seems to be a presumption that the Web 2.0 lingo and dogma is more universal than it actually is]
While many of the Web 2.0 aficionados declare there to be a fundamental religious difference between the venerable keyword and tags, the difference is superficial at best (democratically selected keywords are still just keywords). The same keywords that have always existed as a data block in the JPEG file format, and exists in virtual every document format (Word, for instance), form the foundation of tags. Metadata has been around since we first started storing data, and tags are a continuation of that trend.
Many of the foundations of modern tagging, the evolution of the keyword, were first demonstrated widely by the superlative web photo organizing and sharing application Flickr.
Given the primitive state of image recognition, this was a perfect fit: Without tagging your photo with keywords such as "bridge, burlington skyway, qew", there was no way searches could find that photo if asked, for instance, for pictures of the Burlington Skyway bridge -- We aren't yet at a stage where software can reliable figure out what the subjects of a picture are, and mechanical metadata is still incomplete (although it's getting there), so keywords/tags/folksonomies fills a critical gap if the photography data process.
Outside of photos the use of tags is often much more dubious.
To go back in history a bit, when search engines first appeared they largely relied upon meta keywords. This was a compromise due to limits in the "comprehension" of content -- search engines got confused easily, and even when they could parse the content properly they couldn't truly figure out what the content was about.
Keywords came along, offering a simple, condensed, human-created subset of the data, categorizing the important attributes of the content. Search engines embraced and utilized keywords as an important element of fulfilling search requests.
The honeymoon didn't last for long. It turned out that keywords were a prime stomping ground for search engine spammers, not to mention that it was a horribly limited method of searching through data: Not only were the choices of keywords entirely subjective -- often grossly incomplete and inconsistent -- but by design it was limited to a very, very small subset of the content. If you really wanted content about metal railings, you might have missed my extensive discussion on that topic in my Burlington Skyway Bridge article because I didn't feel that metal railings made the cut for the keywords.
Meta tags are largely dead now.
In its place search engines have become much better at determining what a given page is about (or at least simulating a reasonable promixity thereof). By analyzing content, having a directory of similar and derivative words, and by deriving information by context (such as links and related pages, and how they word links) and layout (noting that heading text, title, and early text holds more importance in classifying the page, though it still is used in concert with the rest of the content), search engines have come a long way it understanding content, and in correlating searches with appropriate results.
The loss of the keyword has proven to be very beneficial for search. Now it's the actual data that classifies the content, rather than artificial metadata.
With improvements in language processors and context associative correlations (e.g. where the content parser understands that the paragraph on boxers is talking about the boxer breed of dog, determined by its correlation with other documents coupled with other details of the language, using language trees to classify probable meaning), things will only get better.
Content search has a very bright present, and a brighter future.
Yet tags continue to spread in woefully inappropriate domains, even where it's serving as nothing more than the modern day equivalent of the venerable META keyword. Instead of building reliable, feature-rich search tools into product, appropriately determining relationships and context to understant content, product vendors are just tossing in a hack-job tag infrastructure and calling their job complete.
Worse still, users are accepting it and calling it a feature.