Dennis Forbes on Software and Technology   Subscribe to RSS


About the Author
Dennis Forbes Dennis Forbes is a Toronto-based software architect. While focused primarily on the .NET and SQL Server worlds, Dennis frequently ventures outside of this comfort zone into game development and image processing. He has been published in several industry magazines, has been quoted in the Wall Street Journal and has been interviewed by NPR.

He is a vice president and lead software architect at an innovative New York City hedge fund back-office services firm.

Dennis has been working on solutions for the financial, telecommunications, and power generation markets for over 15 years.




The Feed Bag

 
Saturday, March 27 2010

Joe Stump – the former Digg lead architect with the coolest name in tech – posted a peripheral response to my recent entry about SSDs and NoSQL.

Rebuttal in tl;dr; Form

The original post was motivated by claims found on Digg’s technology blog.

  • They say that the RDBMS “mindset” favours writes over reads: BLATANTLY WRONG CLAIM.
  • They show poor index and schema use: WRONG DATABASE USAGE.
  • They show that their database product can’t join: BAD DATABASE SERVER. RED FLAG.
  • They report very poor performance without adequate detail: MEANINGLESS PROPAGANDA.
  • They use this to show that the RDBMS can’t cope: SEE ABOVE.
  • They say that if you don’t use all of an RDBMS’ feature set, you’re essentially using NoSQL: ABSURD.
  • They describe scaling out issues with databases: TRUE FOR MYSQL.
  • They described their move to NoSQL: GREAT FOR THEM. THOUGH REALLY THEIR SOLUTION WAS EXTREME DENORMALIZATION.

And on Joe’s post.

  • You need an expensive DBA with the RDBMS, not with NoSQL: SPECIOUS, FLAWED REASONING.
  • Capital expenses suck. Services are better: BUSINESSES GENERALLY LEASE THESE DAYS.
  • $7,500 “just for disks”: FOR A SaaS BUSINESS THIS IS CHEAP.
  • 50 node cluster: 50 NODES IS A COMPENSATION FOR ABHORRENT I/O RATES.
  • SSD drives are expensive: NO THEY AREN’T. YOUR ARGUMENT IS OBSOLETE.
  • Commercial database products are pricey: VIGOROUS AGREEMENT.
  • NoSQL $/read and $/write win: MAYBE, MAYBE NOT. DIGG COULD LIKELY DO MORE WITH A COUPLE OF SSDs THAN THEY CAN WITH THEIR MASSIVE DENORMALIZATION

The Non-ADD Version

Joe has been in the Web 2.01 trenches. He built a solution that powers one of the top sites on the net.

Remember when getting "Slashdotted" was a big deal? Getting on the front-page of Digg makes a Slashdotting-at-its-peak look like a little traffic bump. There are probably a hundred PR reps busy trying to botnet their clients onto the front-page of Digg for every one punished into spamming Slashdot these days.

Far more people know Joe’s all-out-of-bubblegum name than will ever know mine, and rightly so.

A Strawman Built on Cliches and Appeals To...

Joe comes out of the gate resorting to the venerable old-versus-new tactic: "It's just those old-school DBAs upset that us kids are rewriting the rules," he says in not so many words, while nailing himself and his peers onto a cross, seeking pity for the flames they doth receive for their unconventional, rebellious ways.

This is a bit strange, really. Barely a day goes by lately without Hacker's News or Reddit’s /r/programming featuring another front-pager about how the Incredible NoSQL is rewriting the rules of, well, everything. The general demeanour is one that, I think, is far more sympathetic to completely unsupported and undemonstrated pro-NoSQL claims than it is to anything that questions the hype.

Countless NoSQL blogs have appeared (though if you browse them looking for actual content you’ll instead find that most feature few facts but lots of zealous punditry. Advocacy seems to be the primary focus right now). Anyone involved with any sort of NoSQL initiative is spinning off their own start-up to capitalize on this sure-win formula, acting like it’s some sort of magic ingredient that will assure them of success.

It is very reminiscent of the XML heyday – I’m a very big fan of XML in its place, as an aside – when countless start-ups appeared with business models that could be boiled down to “something to do with XML”.

The big database vendors have remained quiet, largely because the miniscule-budget operations all clamouring for their piece of the NoSQL pie aren’t worth bothering with.

But what about Google, Amazon, and Twitter!” you say. Joe resorted to that same appeal to authority by incanting the same magical trio (say it three times quickly and your TPS rate will quadruple!). Not really much to bother with there, beyond pointing out what a cargo cult is. Your bamboo headset won't make you successful like Google. It really won’t.

Unless you are targeting the same problem space as those companies – say like providing very low performance but highly “scalable” database solutions for countless low-value start-ups – their solution choices are utterly irrelevant.

I'm not a DBA (though knowing how indexes work now strangely qualifies one for such a title). I'm just a technically curious solutions guy that has an innate need to keep asking questions and probing deeper until the Want-To-Believe fog that often hides hype dissipates.

On Rinky-Dink Operations

In Joe’s entry he focuses a lot of attention on the costs of RDBMS solutions.

One such argument is that it’s better to use computing hardware as a service than to buy, seemingly implying that while you can buy good hardware to run a RDBMS, it is better to rent less-good virtual hardware to run your NoSQL instances.

Yet leasing is what all the cool kids are doing these days, largely for the same financial reason. Writing it all off beats dealing with depreciation BS, and it makes financial planning a lot easier.

On the leasing front, $600 a month gets you an insanely powerful, makes-an-Extra-Memory-Quadruple-Extra-Large-EC2-Instance-Look-Like-A-Pile-Of-Puke server.

You’ll probably be paying 20x that for every developer you have working on your solutions. Is this really so astronomically high?

That less-than-the-cost-of-the-office-cleaners price tag gets you a server that with a bank of striped SSDs that will almost certainly demolish your impressive-in-count-but-not-in-throughput big scale out cluster, at least with a non-broken RDBMS system.

No really, it will. Of course for any sort of reliable system you’d have to pay for some DB licenses (presuming you aren’t going with PostgreSQL), and then you’ll want to double everything up into mirrors or some other reliable setup, so triple the price.

And really, is the $7,500 spent by 37signals on a disk array really even worth mentioning? I suspect that sort of number ends up almost as a rounding error on their expense sheets, and given that it's pivotal to their operation – it sits under the very foundations of their business – I doubt they spent many sleepless nights over it.

What sort of rinky-dink operations are we talking about here? Does Digg still qualify as a start-up? Don't they have a payroll and all of that, yet they're clamouring to wire up a collection of discount bin servers?

I posted the SSD entry because SSDs really do fascinate me, and I do think they change a lot of the rules of the game. It just happened to dovetail nicely with my investigation of the Digg scenario, where Digg solved their very real I/O issue by essentially pre-caching every possible query result for a targeted need.

Through extreme denormalization they traded storage to reduce I/O needs.

This is a very important point, because it’s far more pivotal to Digg’s solution that the NoSQL versus RDBMS debate.

Call up your old Digg coworkers, Joe, and have them setup a real database server with a couple of SSD drives and see how it compares with their impressive cluster. I’ll bet Dell would happily lend them a real server.

All of this is a bit humorous, really: The whole point of my original entry on this NoSQL topic was simply to say "what is good for Digg isn't necessarily appropriate for all database needs”, so it’s a bit unfortunate that it has come to this, with Digg’s former architect justifying their decision when they were held as a scenario where it is likely the perfect solution.

Then, after seeing the Digg case-study, I felt obliged to respond to their RDBMS claims because I saw them as flawed, indicative that the movement should really be called NoMySQL instead of NoSQL. It still doesn’t diminish the correctness of their choice.

But really, while I originally entered into this debate believing simply that NoSQL is being oversold (it is grossly inappropriate for the vast majority of non web 2.0 projects), the more I investigate the more I’m coming to think that it is a solution for the rapidly disappearing problem of pathetic I/O rates, at least assuming that you aren’t running on several of the cloud solutions where that is your only choice.

There are many other differences that come with NoSQL (many strongly questionable, like the oft lauded “no schema” claim for some of the solutions), but the I/O restriction is by far what sold it on the high end, and the high end is what convinced the little guy that it’s the way to go.

Oracle, DB2, SQL Server, Teradata, Vertica, Greenplum, Sybase and Friends All Cost Way Too Much

I very strongly agree with Joe about one thing: the licensing costs of the big RDBMS products are way too high.

They know that 2% of their potential customer base have giant budgets, and that they can squeeze more from that 2% than they could ever get from the other 98% who then get relegated to fighting over scraps like MySQL.

Not really sure how to solve that problem, but I concede that it is a non-trivial issue. PostgreSQL is probably the best low-to-no-cost database server, but even then quite a few performance features are missing (like real-time materialized views or SQL Server style clustered indexes).

Reader Comments

Dennis, I agree with what you said.

Also, I would suggest Joe to give out sample schema/data for an open competition, that's all.
Rycna @ 3/27/2010 8:43:41 PM
Hey Dennis,

I've been following this whole NoSQL vs RDBMS thing for a while, and find the whole "RDBMS isn't scalable" argument quite amusing as you might imagine.

What has intrigued me though is that you've been putting down MySQL. Is it that bad? I've been using it for about 8 years, but I'm not opposed to switching if there are some clear benefits. Would I see a performance boost by switching to Postgres?
Adam @ 3/28/2010 2:31:13 AM
Dennis,

Thanks for taking the time to post all of the NoSQL vs RDBMS stuff. It's nice to see some healthy debate that highlights the strengths / weaknesses of each solution.
Ryan @ 3/29/2010 5:53:06 AM
Enjoying all this...but how about *really* rinkydink operations? I use SQL Server at work and like it, but for personal unfunded startups on the side, I can't afford a physical server and license. I could go with Azure, but a single instance runs out of storage kinda fast, and then I either have to spend the big bucks on a real server (and hope I have a good revenue model, which I generally don't), or roll my own horizontal scalability...essentially doing things that systems like Cassandra do out of the box.
DennisP @ 3/29/2010 4:48:23 PM
Your posts have been bloody brilliant. It might seem like a little thing, but after your original scalability post the entire tone of the conversation has changed. It is a welcome bit of reality.
Jacob @ 3/29/2010 4:57:26 PM
@DennisP: There are ways you could reuse your knowledge in SQL Server for your personal unfunded startup(s).

1. SQL Server Express (yes, 1GB RAM max, 4GB storage max, 1 core max)
2. BizSpark : get all the high end licenses for free for three years http://www.microsoft.com/BizSpark/

At the end you either purchase the license(s) from then on either pay 100 (one hundred) USD and go your separate way.

I guess that if after three years you can't afford the license(s) the business ain't going anywhere anyway..
Andrei Rinea @ 3/30/2010 6:18:46 AM

Add Comment

Name *:

Email Address:

(your email address is not displayed)
Website:

Comment *:


Dennis Forbes