Tuesday, March 02 2010

Getting Defensive

I work in the financial industry. RDBMS’ and the Structured Query Language (SQL) can be found at the nucleus of most of our solutions.

The same was true when I worked in the insurance, telecommunication, and power generation industries.

So it piqued my interest when a peer recently forwarded an article titled “The end of SQL and relational databases”, adding the subject line “We’re living in the past”.

[Though as Michael Stonebraker points out, SQL the query language actually has remarkably little to actually to do with the debate. It would be more clearly called NoACID]

That series focuses on NoSQL as the challenger to the throne.  It isn’t alone as the past year has yielded a bountiful crop of articles and blog entries declaring the imminent death of the decrepit relational database at the hands of this new innovation.

Most get posted with incendiary, absolute statements against the RDBMS.

The ACIDy, Transactional, RDBMS doesn’t scale, and it needs to be relegated to the proper dustbin before it does any more damage to engineers trying to write scalable software.

And they usually see later edits that blunt the original euphoria.

postnote: This isn’t about a complete death of the RDBMS. Just the death of the idea that it’s a tool meant for all your structured data storage needs.

Indeed.

Few hold the RDBMS as the only tool for all of your structured or unstructured data storage needs, though that strawman makes an appearance in many NoSQL advocacy pieces, adding some unintentional comedy (“irony”) given that the same entries usually call for the death of the RDBMS, with NoSQL declared the one true way to store and retrieve data.

Page 493 (as labelled by page) of the article “The Paradoxical Success of Aspect-Oriented Programming” includes a fantastic quote and graphic from an IEEE editorial by James Bezdek in IEEE Transactions on Fuzzy Systems.

[I quote indirectly given that the original source isn’t publicly available]

Every new technology begins with naive euphoria — its inventor(s) are usually submersed in the ideas  themselves; it is their immediate colleagues that experience most of  the wild enthusiasm. Most technologies are overpromised, more often  than not simply to generate funds to continue the work, for funding is an integral part of scientific development; without it, only the most  imaginative and revolutionary ideas make it beyond the embryonic stage. Hype is a natural handmaiden to overpromise, and most technologies build rapidly to a peak of hype. Following this, there is almost always an  overreaction to ideas that are not fully developed, and this inevitably leads to a crash of sorts, followed by a period of wallowing in the depths of cynicism. Many new technologies evolve to this point, and then fade away. The ones that survive do so because someone finds a good use (= true user benefit) for the basic ideas.

In the case of the NoSQL hype, it isn’t generally the inventors over-stating its relevance — most of them are quite brilliant, pragmatic devs — but instead it is loads and loads of terrible-at-SQL developers who hope this movement invalidates their weakness.

Some sort of Fight Club ground zero wiping of the records, rewriting the rules of the game.

It doesn’t.

Nonetheless there is indisputably a lot of fantastic work happening among the NoSQL camp, with a very strong focus on scalability.

So what is scalability, anyways?

Scalability is a poorly-defined concept that, more often than not, is twisted to suit the speaker’s agenda. Scalability is often the excuse to engage in absurd hypotheticals to sell a particular blend of fanaticism.

Putting aside wordplay — or perhaps to engage in some of my own — scalability is pragmatically the measure of a solution’s ability to grow to the highest realistic level of usage in an achievable fashion, while maintaining acceptable service levels.

Imagine the scenario that you’ve built an internal help ticket tracking system for your branch office of Money Bags Corporation. If you had to describe the data needs in three points, they would be-

  • Data is highly interrelated (relational)
  • High-value users and transactions
  • Data consistency and reliability is a primary concern

You decide to go against the hype and build it on a classic RDBMS system.

Will it scale to the real-world requirements?

There are some real scalability concerns with old school relational database systems. Adam Wiggins does a pretty good job of covering the techniques to scale a SQL database, though I strongly disagree with his end assertion.

You face those concerns on that glorious day the CEO calls to tell you that the board is super excited about your team’s help ticket system, built on SQL Server, and they want you to deploy it corporation wide. For data consistency purposes they want a single instance, instead of alternative deployment scenarios like pushing out an instance (“shard”) for each division.

Can you make it work?

When Money Is No Object

Of course you can. Even on the maligned Windows platform.

From a vertical scaling perspective — it’s the easiest and often the most computationally effective way to scale (albeit being very inefficient from a cost perspective) — you have the capacity to deploy your solution on powerful systems with armies of powerful cores, hundreds of GBs of memory, operating against SAN arrays with ranks and ranks of SSDs.

The computational and I/O capacity possible on a single “machine” are positively enormous. The storage system, which is the biggest limiting factor on most database platforms, is ridiculously scalable, especially in the bold new world of SSDs (or flash cards like the FusionIO).

Such a platform can yield very satisfactory performance for tens or hundreds of thousands of active users in most usage and application scenarios (where generally clients talk to a farm of middleware servers).

Of course if you index poorly or create some horrendous joins you can screw it up, but with competency it will be good times for all. Even with billions upon billions of help tickets.

For the purposes of the application, the scalability requirement is completely satisfied — total scalability is achieved in the context of the application.

But it doesn’t end there.

From a horizontal scaling perspective you can partition the data across many machines, ideally configuring each machine in a failover cluster so you have complete redundancy and availability. With Oracle RAC and Sybase ASE you can even add the classic clustering approach.

Such a solution — even on a stodgy old RDBMS — is scalable far beyond any real world need because you’ve built a system for a large corporation, deployed in your own datacenter, with few constraints beyond the limits of technology and the platform.

Your solution will cost hundreds of thousands of dollars (if not millions) to deploy, but that isn’t a critical blocking point for most enterprises.

This sort of scaling that is at the heart of virtually every bank, trading system, energy platform, retailing system, and so on.

To claim that SQL systems don’t scale, in defiance of such obvious and overwhelming evidence, defies all reason.

And you don't need to spend a million dollars. A mid-level Dell server can easily handle the vast majority of real-world database needs: No, your project likely isn't going to have the needs of Twitter, Flickr, or Facebook. You can grab a four CPU Dell server hosting a total of 24 cores of latest-tech computing power, with 128GB of RAM, for around $15,000. That is beefier than the systems that ran many enterprises just a few short years ago.

Artificially Limited Scalability

Imagine that you’re a start-up building your big new Social Media site

Obviously you don’t have your own datacenter, but instead you’re going with cloud servers to host your solution.

You don’t have the option (much less the finances) to buy and install a Unisys 7600R, or even a loaded Dell R905. You don’t have TBs of memory or massive I/O at your disposal.

Instead you have to go with the options available on a host like Amazon’s EC2, where the most powerful choice available is the High-Memory Quadruple Extra Large (!) option at $2.40 / hour (at writing), or about $21,024 a year, which is a fairly reasonable rate given that an equivalent purchased server would run you about ten thousand dollars up front.

This is very powerful compared to their historic maxed-out image — the puny large image that used to represent the top end — and is large compared to the max of many other cloud hosts, yet it is entry level in the RDBMS database world.

I/O on the EBS has been measured with a throughput in the 30MB/second range  with about 72 IOPS per volume, which is one-half the speed that my Atom-based home NAS achieves. You can stripe multiple volumes into a software RAID array, but you quickly limit the I/O available to your instance.

For comparison we’re currently looking at an entry level $8K 36TB iSCSI device that would offer our database a dedicated 400MB/second throughput and about 1500 IOPS, and this is for a pretty humble low-criticality need with low-end magnetic drives.

As a speculative start-up you don’t want to commit $20K/year to have a single instance hanging around, especially given that your traffic is extremely variable and most of the time it will sit idle. You want to run the smallest database layer possible, ramping up if the need (fingers crossed) arises.

In an ideal world you could float along on a small instance economically until that big day when you get mentioned on Digg, at which point you spool up ten extra large instances, turning them off when the need passes.

These financial and artificial limits explain the strong interest in technologies that allows you to spin up and cycle down as needed. It’s why the old guard has largely remained quiet (because it solves a problem that they don’t have, notwithstanding any manufactured “my friend has a super-duper 512CPU Sun box and it is always overloaded!” scenarios), while a million hopeful start-ups with their small EC2 instances are loudly bleating about the limits of scalability with SQL systems.

The Needs of a Bank Aren’t Universal

The world of financial firms and retailers and other RDBMS users is very different than the popular social media scenario usually played out.

If you had to describe your social media data needs in three points, they would be-

  • Largely unrelated islands of data
  • Very low value user/transaction value
  • Data integrity is not critical. If you lose a Status Update, or several thousand of them, it will likely go unnoticed, or at least won't cause a major situation.

MySQL originally lacked many traditionally mandatory RDBMS elements, such as transactions, without which it is extremely difficult to maintain a high level of data integrity. That didn’t dissuade many of its boosters who declared that it was an unnecessary cost for the purposes that they used it.

They were right.  As MySQL has moved towards the values of traditional databases, it has moved away from its original bag-of-data values.

The truth is that you don’t need ACID for Facebook status updates or tweets or Slashdots comments. So long as your business and presentation layers can robustly deal with inconsistent data, it doesn’t really matter. It isn't ideal, obviously, and preferrably you see zero data loss, inconsistency, or service interruption, however accepting data loss or inconsistency (even just temporary) as a possibility, breaking free of by far the biggest scaling "hindrance" of the RDBMS world, can yield dramatic flexibility.

This is the case for many social media sites: data integrity is largely optional, and the expense to guarantee it is an unnecessary expenditure. When you yield pennies for ad clicks after thousands of users and hundreds of thousands of transactions, you start to look to optimize.

The same efficiency applies to highly relational schemas — if you can just serialize object graphs and that’s all you need, why bother normalizing? Many would argue that it’s a premature optimization, but if it’s all you need it might be the best choice.

Both of those decisions would be outrageously negligent in many other industries, but the rules that apply for a banking system have woefully little applicability to a social media site.

SQL is Scalable and NoSQL Isn’t For Everyone

The point is one that I think all rational people already realize: The ACID RDBMS isn’t appropriate for every need, nor is the NoSQL solution.

A social media site is not an inventory system. A banking account management system is not a social news aggregator.

Picking and choosing database terminology from the Wikipedia entry on RDBMS’ doesn’t equip the speaker with an expert level of knowledge to declare the truth about the database industry.

Scalability noise based upon the limitations of a cloud vendor’s offerings needs to be put into context: They don’t apply to most of the users of relational databases.

MySQL isn’t the vanguard of the RDBMS world. Issues and concerns with it on high load sites have remarkably little relevance to other database systems.

And of course the SQL/RDBMS world is changing (sidenote: Few love SQL, but I’ve yet to see a viable replacement). Wouldn’t it be a grand world where every desktop (platforms that spend about 99% of their time completely idle) in a corporation was a part of the corporate cloud, all seamlessly acting as a part of the corporate information system in a reliable, redundant way? A simple SQL statement silently and transparently fulfilled by hundreds of distributed systems?

We’ll get there.

Aside: I'm currently building a solution (to fill this space) that significantly leans on Project Voldemort. I have somehow managed to remain rational.

Postnote

This is one of those rants that strangely gets attention, with several taking it as anti-NoSQL, or even pro-RDBMS, I assume because positions so often seem to be polarized. It is neither, which is quite evident if read with an unbiased mind: Defending the real world practical scalability of the maligned RDBMS merely brings accuracy to the debate. Several have asked if I'm merely attacking a strawman: Aside from several specific links that I gave above (I am remiss to add more as I've engaged in the blog-to-blog arguments too many times before), I find it hard to believe that these people take part in any technology discussion forum or group, where NoSQL is being quite widely, and often without question, held as successor to the RDBMS...the new evolution of database systems.

The motivation of the post is that the discussion is, by nature of the venue, hijacked by people building or hoping to build very large scale web properties (all hoping to be the next Facebook), and the values and judgments of that arena are then cast across the entire database industry — which comprises a set of solutions that absolutely dwarf the edge cases of social media — which is really...extraordinary. It's a bit like moving to the bottom of the ocean and declaring that everyone should start using submarines to commute.

There have been edge conditions in the database world for as long as there has been an industry. High performance logging/data acquisition (often distributed), for instance, has always been a case where traditional RDBMS systems aren't suited, and thus should be jettisoned. The industry didn't rewrite the rules because of those fringe cases, however, for good reason.

 SQL  NoSQL 
   

Reader Comments

Brilliant. I've forwarded this to my team.

We make a tax solution and I've been dealing with vague "we should use NoSQL" comments from a few of the less capable members of the team. I love the movie reference because that is exactly what my impression is of the motives.
Jeff @ 3/2/2010 2:17:04 PM
Just saw this on dzone and absolutely love it. Thank you.
Steve J. @ 3/2/2010 2:52:29 PM
You just earned yourself a new subscriber.
..xxJoelxx.. @ 3/2/2010 2:54:15 PM
"You just earned yourself a new subscriber." -- same here :)
Annie @ 3/2/2010 5:33:02 PM
I don't think NoSQL people usually claim that SQL isn't scalable, just that it's unnecessarily complicated to scale.

You generally have to partition your data horizontally and thus give up many of the features that SQL has to offer: ACID transactions, unique keys, auto-increment primary keys, etc.

Then you have to come up with your own solutions to replace those features: eventual consistency, UUID keys, map/reduce, etc. And these happen to be exactly the kind of features that many NoSQL databases can give you out-of-the-box.
Kenneth Falck @ 3/2/2010 5:34:06 PM
Brian Aker's NoSQL roast is hilarious and educative - http://www.youtube.com/watch?v=LhnGarRsKnA
Ashwin Jayaprakash @ 3/2/2010 5:43:11 PM
see CAP principle!

no SQL is No SQL or Not only SQL.

we use Cache + SQL for scalable, see my framework.
banq @ 3/2/2010 6:00:12 PM
The part I agree most is: "In the case of the NoSQL hype, it isn’t generally the inventors over-stating its relevance — most of them are quite brilliant, pragmatic devs — but instead it is loads and loads of terrible-at-SQL developers who hope this movement invalidates their weakness."

Over MyNoSQL (a blog focused on NoSQL technologies), most of the people talking about real life uses of NoSQL mention the operational costs of RDBMS scaling (f.e. Twitter usage of Cassandra: http://nosql.mypopescu.com/post/407159447/cassandra-twitter-an-interview-with-ryan-king), a point that your post seem to agree upon too. Personally, I tend to believe that NoSQL as RDBMS are just tools for our job and there is nothing about the death of one of the other. But as we've learned over years, every new programming language is the death of all its precursors, every new programming paradigm is the death of everything that existed before and so on. The part that some seem to be missing or ignoring deliberately is that in most of these cases this death have never really happened. And I think the same will apply to NoSQL and RDBMS and OODB and etc.

As a side note, I'd say that the definition you are using for scalability, while not incorrect, is a bit too wide. I think Wikipedia definition is giving a more clearly delimitated definition. I have also found Jonathan Ellis presentation (http://nosql.mypopescu.com/post/413155873/presentation-what-every-developer-should-know-about) on scalability quite useful and easy to grok for everyone.

Last, but not least, while I do like your ending note: "SQL is Scalable and NoSQL Isn’t For Everyone", I'd say that an even better one would be "SQL is scalable. SQL scalability isn't for everyone. NoSQL isn't for everyone either".

bests,

:- alex
Alex Popescu @ 3/2/2010 6:11:40 PM
[Last, but not least, while I do like your ending note: "SQL is Scalable and NoSQL Isn’t For Everyone", I'd say that an even better one would be "SQL is scalable. SQL scalability isn't for everyone. NoSQL isn't for everyone either".]

I posted this entry despite feeling that I wasn't as balanced as I wanted to be, but it was the result of seeing yet another "SQL doesn't scale" post that pushed my buttons: the context is always framed against an environment where you only have access to low-end machines, which of course isn't the case for most who target the RDBMS.

But I completely agree with your assessment.
Dennis Forbes @ 3/2/2010 6:26:40 PM
The cost of an EC2 High-Memory Quadruple Extra Large server for one year is $13,733, not counting storage and bandwidth (which for a non-user facing database server would be negligible), not $21,000 as you stated. You probably made the common error of annualizing the on-demand price instead of using the reserved price.
Jeffrey @ 3/2/2010 7:06:28 PM
One thing that is always missing when we talk scalability and RDBMS is that reading is easily scalable, while writing only scales vertically.

Scaling horizontally means that you can place a identical box next to the original db box and have increased performance. When the first box is maxed out because of writing, adding a second box won't help you, since every time the first box is written to, the second one has to be written to too. So you end up with 2 maxed out boxes instead of one.

The next step is splitting your data into several databases and since RDBMS' are usually highly coupled, this is no easy task.

As you mentioned in the article, in a lot of cases a 100% integrity isn't needed all the time, which means a master write box and several read boxes is a possibility. But it comes down to trade-offs.

Personally I think the NoSQL group is doing a good job of spreading the word that RDBMS is not the holy grail of databases. It just doesn't fit every project :)
Thomas Winsnes @ 3/2/2010 7:36:06 PM
Interesting rebuttal and I agree that the scaling of SQL (either on one machine or horizontally) is fine for 99% of apps today.

My biggest question after reading it though is how would you recommend scaling an app like Twitter or Facebook with SQL?
Brian Armstsrong @ 3/2/2010 8:16:44 PM
i had the pleasure of being exposed to a key/value store on a massive scale at yahoo going back to the 90s. my impression was that they were best used in conjunction with a rdbms, not to the exclusion of a rdbms.

in the case of yahoo, there are certain elements of data (userid, signup date, etc) which are essential anywhere on the site, but users end up on one part of the site with specific, complex functionality (sports, news, etc), and thats where the rdbms comes in...providing specific, complex functionality for a branch of the site...a small enough branch that the problems of scale become tractable
whoop dedo @ 3/2/2010 8:17:26 PM
The distinction between your typical RDBMS solution and your typical NoSQL solution is that the first one has a real problem handling billions of rows efficiently and the second one is built to handle this natively. We all know that RDBMS solutions are great in the majority of scenarios, but once you start talking about billions of rows, and that kind of volume, things start to get ugly and *slow*.

Financial industry might use Oracle for storage, but for analysis, trying to parse through billions of rows quickly, doing it across thousands of machines simultaneously, NoSQL solutions and other concepts (beowulf clusters, etc), start to make sense. Each have their strengths, weaknesses, reasons for using one, reasons for not using the other, etc.
brian @ 3/2/2010 9:16:13 PM
Hey there Brian.

While I've had pretty good luck with the RDBMS with even billions upon billions of rows, for analysis and calculations we use both a custom native data engine, and MonetDB (a very high performance columnar RDBMS), which we interact with via MAL, which is essentially the assembly language for the database system.

As you said, each has its place and strengths and weaknesses. I wouldn't use MonetDB for a OLTP DB or even a DW, but it is absolutely brilliant for analysis.
Dennis Forbes @ 3/2/2010 10:01:10 PM
Your description of a million web startups running on small EC2 instances lamenting scalability of SQL is funny yet true.

I too am a startup hoping to spin up the extra large instances someday, but providing a path forward for startups is exactly the point. Scalability of SQL is expensive or complicated or both, which you've admitted to in your post. Neither option are good for startups, so of course they will say SQL is a burden and there needs to be something better to take its place.
Lida Tang @ 3/2/2010 10:59:51 PM
There is just one correct definition of the term scalability. Adding more RAM to your machine isn't scalability, it's a workaround.

http://www.allthingsdistributed.com/2006/03/a_word_on_scalability.html
Sebastian @ 3/3/2010 3:01:45 AM
No, there isn't, Sebastian, though it's not surprising that you link to a NoSQL advocacy site as the definer of the one true definition of scalability.

Vertical and horizontal scalability has been a staple of RDBMS' for decades, yet recently the NoSQL camp has decided to gradually toy with the definition until it can only possibly exclude RDBMS, which is a remarkably cheesy tactic to get attention.

It is never completely linear scalability, as it isn't with any solution. Nor is it endlesslessly scalable, as it isn't with any solution. However once you've passed a certain level for the solution it is a advocacy tool that has little relevance to the real world, but that is the case for most NoSQL chatter (the *solutions* have a lot of applicability to a whole category of solutions, but the advocates are largely doing nothing but talking about how great it is).
Dennis Forbes @ 3/3/2010 4:52:16 AM
Wouldn't agree about NoACID. CouchDB gives strong ACID guarantees for instance.

Scalaing to maintain latency at very high load is a key driver reason for many NoSQL solutions, but not the only.

There’s also scaling (distributing) to meet geographies (likely requiring high levels of partition tolerance), and scaling solutions to meet problem complexity (choosing the wrong way of modeling data here can cause big problems) – areas that different NoSQL solutions seek to provide benefits. There’s also scaling to report on large datasets (not just the transactional load that he discusses) where the dataset can’t fit on a single box – the kind of thing that Hadoop excels at.

Also, relational models are not a fit for every problem. This is a large reason for a lot of the NoSQL stores out there, such as Neo4J, Riak, CouchDB, MongoDB, DB4O, etc…

Don't forget also a strong driver to run on commodity hardware in a lot of cases, something which can lower operational costbases, and open many interesting opportunities up.

I'm not defending 'Death to RDBMS mania' - but I think that the current 'big' implementations of RDBMS theory are increasingly unwieldy and expensive to own. Seting up this strawman & then arguing against it probably offers no more than the strawmen you cite.
@NeilRobbins @ 3/3/2010 8:00:19 AM
The fact that these definition comes from the inital advocate of those systems doesn't make it less true. And it sure is linear scalable. But, I agree very strongly that 99% of the people hyping it will never actually grow to a size where Memcache wouldn't solve all of thier problems.
Sebastian @ 3/3/2010 11:44:09 AM
I think you've missed a typo in "Very low value user/transaction value" -- "value" is mentioned twice
Max @ 3/3/2010 12:45:09 PM
As with most things this I think the truth lands not in the black or white but squarely in the gray. My job within IBM is to place these products in their respective places and you can imagine that's a challenge considering we're highly invested in every position on the spectrum. From the huge DB2 EEE deployment to the in-memory SolidDB and eventually the more NoSQL-like WebSphere eXtreme Scale.

It all comes down to specific applications. I learned the first day on the job that generally these non-RDBMS technologies aren't specifically replacing 'long haul' legacy database technologies. Generally data held this long ends up being needed for some sort of reporting, and it's hard to beat an RDBMS at reporting tasks as they're inherently 'single answer from alot of data' type operations.

Where we've found traction with the scaling derivatives and horizontal data grids is exactly those scenarios where we didn't know we needed to grow, or we needed to grow on very rare occasions. It's true that horizontal flavors of RDBMS technology exist but beyond the established limitations they're also HARD to work with. They don't incorporate the 'elastic' nature of many of these scale products and as such there is a need, and a large one, for an alternative.

I would never be replacing SQL/RDBMS in every case though. It's about having a tool box.. not a single tool.
Rob Wisniewski @ 3/3/2010 2:02:58 PM
> Wouldn’t it be a grand world where every desktop (platforms that spend about 99% of their time completely idle) in a corporation was a part of the corporate cloud, all seamlessly acting as a part of the corporate information system in a reliable, redundant way?

This sounds good on paper but in actual use the problem arises the moment the user tries to use "his" computer. By virtue of rather uncooperative memory management, all his stuff has been paged out to make room for the "unintrusive" background task.

There are many issues related to this on the BOINC bug tracker.
Alex @ 3/3/2010 2:23:41 PM
Very balanced, I liked it. I've managed Sybase to multi-million row databases with millisecond access, and I've seen (and fixed) similar systems that took minutes on 100's of rows. Design and analysis is essential but if I've seen anything, any database SQL or no-sql, bad design is going to cost, there are no magic-bullets with databases you can screw up either.
Randall Jordan @ 3/3/2010 2:53:48 PM
Very good post indeed. Both the options are best suited for meeting industry specific needs.
Moiz @ 3/3/2010 4:26:54 PM
Very nice write up. But can be distilled into a few sentences:
- SQL is not the target of NoSQL, ACID and RDBMS is*
- NoSQL is not for everyone
- RDBMS is the simplest option
- Traditional RDBMS'es don't scale economically**

* - I would argue that ACID is not a target of NoSQL movement, since there are projects that are focusing on ACID in the NoSQL world. GigaSpaces is one of them.
** - Oracle RAC will cost an arm and a leg with at least one kidney thrown in down the line
JAlexoid @ 3/4/2010 3:21:20 AM
Fact is that the RDBMS still is the simplest option and, in most of the times, the most logical one. Sadly, even when not working with Oracle, it will cost you a leg, arm, kidney, eye and maybe your soul.

NoSQL seems like a viable future solution for many applications, but I would wait for more case scenarios to be considering a study over it.

Still cool article :)
Cristiano @ 3/4/2010 6:05:40 AM
My sentiments exactly ... most of the NoSQL advocates strike
me (as an enterprise developer) as irrelevant. We have the
money and political willpower to pay for Oracle RAC and set up
our servers on racks of SSDs.

Of course, if I were working on a social networking startup,
I'd happily consider a NoSQL option, because the data
store requirements are different (as you point out).

Thanks for being one of those rare species in our industry:
reasonable :)
Alan @ 3/5/2010 8:45:19 AM
Awesome thread.

I'll offer yet another attempt to net out the NoSQL vs. traditional SQL DB differences:

1. Transaction handling - ACID or not (NoSQL)
2. Access language - SQL or not (NoSQL)
3. Schema - Relational or not (NoSQL)
4. Scalability architecture - Scale up or scale out (NoSQL)

These 4 differences don't universally apply to all NoSQL databases (based on what I've seen)...usually 1 or more apply.

NOTE - the 4th item, scalability...I admit there's probably better language to use there, and the different approaches to scaling (partitioning schemes, MPP architectures, sharding, et al) probably deserve their own thread. Hopefully the right gist comes across.

Mike Stonebraker (mentioned at the top of the article, and inventor of Postgres) has developed a fast, super-scalable new ACID/SQL/Relational/Scale-out OLTP DBMS- <a href="http://www.voltdb.com">VoltDB</a>. It's in beta now (note: I work for them).

VoltDB is different in architecture from the current NoSQL products, but like NoSQL, offers better scalability than traditional SQL DBMS.

Other new DBMS architectures coming to market: NimbusDB (written by Interbase developer, Jim Starkey), Akiba, Basho.

It should be an exciting year in DBMSs.
Andy E @ 3/7/2010 4:56:28 PM
"You just earned yourself a new subscriber." -- same here :)
Sorin S @ 3/9/2010 3:10:53 AM
A good article and its obvoisly not only me that thinks a lot of moderen developers witha rather shallow technical knowledge dont understand why a RDBMS is for most cases a good way of storing and retrieving.

Hvaing worked on some big (at the time) non RDBMS's based systems I helped developed a billing system for BT that ran on an ISAM based system and coded in FORTRAN and PL/1G - I much prefer being able to not have to wory about ACID.

PS yes I did develop a Billing system in fortran so I guess that makes me a REAL programmer :-)
Maurice @ 3/13/2010 3:38:37 PM
Dennis,

I think you nailed it. The NoSQL conversation is largely being led by folks at companies building websites that are swimming in recent Stanford grads who like to code at a low level, all want to be the next Facebook and ergo care about massive scale, like the idea of open source, don't want to spend money on software, and indeed actually have no requirement for the high consistency/transaction standards offered by DBMSs. Who cares if you lose some tweets?

A scary result would be regular companies trying to use these same tools either when they lack the talent to do so (think: sharp objects) or actually do have DBMS requirements.

My own post(s) on NoSQL are here: http://www.kellblog.com/2010/02/24/the-database-tea-party-the-nosql-movement/

In terms of successors to RDBMS, I do believe that one day XQuery will have its day in the limelight. It was re-do of SQL done by SQL creators. All major RDBMSs have implemented it. But there is no commercial pressure for wide rollouts because it's disruptive and against the RDBMS vendors' commercial interests.

Best,
Dave
Dave Kellogg @ 3/18/2010 7:40:04 AM
Very nice post indeed. Here are some links that also cover the issue of RDBMS Primacy Downgrade (i.e., RDBMS is the solution to everything, so its getting a Primacy downgrade as opposed to becoming irrelevant).

Links:

1. http://bit.ly/euz2O -- RDBMS Era Primacy Downgrade is Nigh post
2. http://bit.ly/cCtrzz -- Report from recent DBMS tech gathering re. how to bridge DBMS realms
3. http://bit.ly/aajeBE -- Round up from last years VLDB conference where RDBMS and Web misconceptions were discussed at length.

Kingsley

Kingsley
Kingsley Idehen @ 3/18/2010 8:44:08 AM
Better go tell the folks at developer.force.com that SQL databases don't scale.
JB @ 3/19/2010 5:20:38 AM
I'm under the impression that most of the people who think they need NoSQL actually need a columnar database; especially those who talk about billions of rows. But there are plenty of real time applications where the overhead of parsing SQL is in fact a deal-breaker.
James @ 3/24/2010 8:28:32 PM
"You just earned yourself a new subscriber." -- same here :)
Johan Hernandez @ 6/8/2010 10:00:25 AM
Great Blog! Get the feeling there are no truly 'right' answers with databases? I tend to agree with most that RDBMS will rule and with very good reason. See Joels take on fundamental issues and why key pairs/NoSQL XML/Xpath will never compete: http://www.joelonsoftware.com/articles/fog0000000319.html

I like SQL/RDBMS but feel that they have a lot of old-time baggage and really need a 21st century rethink. But they work and are scaleable at a cost. I just wish I could afford it!
TimB @ 7/2/2010 3:00:43 AM
yea.. (thumg up) !!

comparison of this sound like java is dying, and ROR rules ''||

seem like they going too extreme from one side to the other.
kiwi @ 8/1/2010 10:38:41 PM
Your is the most balanced article on NOSQL that I have read yet, and I have been in this industry a looong time. I congratulate you. I'm in an airport now or I would write more. Well done and thank you. I love the quote on "hype".
Tom Haughey @ 8/18/2011 7:55:18 PM
Come on!

I agree with the underlying message of your post, but your examples are not really good, specially "when money is no object".

One of the good things of Scalability is that, in the same deal, you also get High Availability "almost for free".

Your "Money is No Object" case does NOT Scale passing certain point, no matter how much money you throw at it and, what is more important, you are designing a very critical SPOF (Single Point OF Failure), if that expensive rack room caughts fire you may have a very expensive service disruption or even loose your data forever. Data NEEDs to be replicated on different locations, specially when:

"
- High-value users and transactions
- Data consistency and reliability is a primary concern
"

Today, if you have to provide an Internet service where you need ACID semantics(for instance online Banking), you are quite fuc... up!

In ths case, NoSQL is NO solution for you (I agree) and SQL, although it is still the only way, is quite a pain.

No matter how much money you have, you cannot afford long service disruptions NOR losing your costumers data FOREVER, and it is better (and cheaper) to go for a bit less vertically scalable deployment (don't put all your eggs in one basket...) BUT more REPLICATED and maybe also SHARDED database divided up into different datacenters for HA, Scalability AND Latency reasons (it makes sense that your european users go to the european datacenter while the american ones go to the american and the asian ones go to the asian, etc).

The truth is that NoSQL (or BASE) solutions fall TOO short in features to be able to threaten traditional SQL solutions. They can solve some scalability scenarios but not all.
josvazg @ 11/4/2011 12:34:48 AM
By the way, it seems the guys at Xeround are making this discussion deprecated by the minute:

http://xeround.com/cloud-database-comparison/amazon-rds-feature-comparison/

Xeround Could DB is an SQL database for the cloud. You have, for the time being a MySql interface for your relational apps and by changing your db driver config your DB is scaled for you, it can grow and shrink on demand (it runs on Amazon and Rackspace, and other clouds to come)

It seems the recipe is to use a SQL Front End, fully ACID as it seems, backed by a NoSql type storage back end. So basically you get the benefits of NoSql scalability without losing the features you expect from a Relational Database.

How do they broke the law of "ACID can't scale" is a mystery, but it seems they got away with it.
josvazg @ 11/22/2011 2:07:05 AM
Beautiful. Internet needs more of this shit.
David @ 4/9/2012 8:20:24 AM

Add Comment

Name *:

Email Address:

(your email address is not displayed)
Website:

Comment *:



About the Author
Dennis Forbes Dennis Forbes is a Toronto-based software architect. While focused primarily on the .NET and SQL Server worlds, Dennis frequently ventures outside of this comfort zone into game development and image processing. He has been published in several industry magazines, has been quoted in the Wall Street Journal and has been interviewed by NPR.

He is a vice president and lead software architect at an innovative New York City hedge fund back-office services firm.

Dennis has been working on solutions for the financial, telecommunications, and power generation markets for over 15 years.





 
Earlier EntriesLater Entries

Dennis Forbes