Dennis Forbes on Pragmatic Software Development
Subscribe to RSS
 
Monday, July 31 2006

"That design might work for a stateful desktop app, but it isn't appropriate for the stateless web."

"O/RM isn't appropriate for stateless environments like HTTP!"

"This component wasn't made for the stateless environment of HTTP!"

"...but HTTP is stateless!"

If you've done any sort of web development, you've probably heard proclamations like these. You may have even made them yourself.

But what do they really mean? Do they add any value to the conversation?

So What Does Stateless Mean Anyways?

Stateless refers to an architecture where each HTTP request is fundamentally detached from requests that came before, and unrelated to requests that will follow.

In a stateless world, the browser initiates a TCP connection on port 80 - traditionally, or port 443 if it's a secure connection - and then sends some basic commands, such as the desired document (e.g /images/coolpicture.jpg), along with this-request preferences like the user's desired language.

With no prior information about the caller - acting only on the newly generated information in the request (e.g. the document requested, along with user submitted form values) - the server sends the results.

> GET /images/coolpicture.jpg

< the binary data for /images/coolpicture.jpg..

After the single request is serviced, the connection is torn down in this stateless scenario. The desired goal was to service each request as quickly as possibly, freeing the resource-heavy, finite-quantity connection to service other callers.

Maximum output with minimum resources.

This served the early web very well. Mirsky's Worst of the Web could be served out to thousands of anonymous consumers with gusto on minimal hardware, fulfilling the liberal information sharing origins of HTTP.

Stateless In The Non-Internet World

For a historic analogy, think of the 411 telephone service - you dial the number and establish the connection. You tell the operator the person whose number you require, and they provide a number in response. The call is disconnected, freeing the line and the operator for the next caller.

This is stateless in that the service relies upon no contextual information preceding the call to provide the service, allowing a small number of operators and connections to handle a large number of lookup requests, needing no resources beyond a simple phone book.

A stateful 411, on the other hand, would be one where you called 411 and left the phone off the hook, maintaining the connection for perhaps days at a time. With each number lookup request, they would try to interpret what you really mean based upon the requests that came before.

"Earlier you asked for a bait store on Main street, and now you're looking for a tackle store. I'm going to guess that you probably want one on or near Main street. The number is..."

Such a stateful connection wouldn't even require you to maintain the call - they could just pull up your records based upon the calling phone number, immediately having the history of your interactions to draw from in a stateful manner, regardless of the transience of the individual call.

Stateful Back In The Internet World

The stateless definition of HTTP was used to contrast with existing services like telnet and FTP, where a TCP connection (itself a stateful protocol) was made, after which a state was maintained and modified from command to command -- whether you were logged in, what directory you were in, what application was running, and so on.

The state was alive and changing until the connection was dropped, with a block of server resources dedicated to keeping alive a world just for you.

That design worked for those services because connections were generally "higher value" per request - a long running file transfer that couldn't serve many clients anyways, as a function of the large number of bytes per request; a professor running some batch jobs; etc.

Bridging the Gap

Most readers will know that almost all websites these days appear to be stateful.

You log on. It presents data that is specific to you, using preferences that are individual to you. As you do things, the environment changes and adapts, incorporating your interactions into following requests.

This isn't just an illusion, or a bastardization of the web: THESE WEBSITES ARE STATEFUL.

So how did the web sneak up and become stateful on everyone? Well, generally via the magic of cookies (alternately via URL-appended session identifiers to simulate cookies), an addition to the HTTP protocol that was first implemented by Netscape back in 1995.

A session cookie is often nothing more than a unique identifier (preferably with enough entropy that users can't guess each other session identifiers, for instance a randomly generated GUID), passed to the server on each request, allowing the web server to tie requests together, building a set of session data to provide state for a given client -  The logon form changes the home page render changes the topic listing changes the calendar selector changes the news view, and so on, with each page having available a set of stateful information about the client, forming a sort of virtual "persistent connection" over many individual, seemingly isolated HTTP requests.

"Ha! Got You! There Isn't A Constant Connection! So It's Stateless!?"

Ignoring the fact that in the modern world HTTP connections are reused (given that a client will often request dozens or more documents to build a single page - or in the case of Digg about 37,528 - it was found to be cheaper to just let the client reuse a built connection for multiple requests), often people differentiate HTTP from being "stateful" because it doesn't maintain a constant connection for the entire session.

Yet what is a connection? In this case it would be TCP, a "stateful" protocol. TCP is stateful in that it changes based upon what has happened before, and each packet for the duration of a connection relies upon those before them getting through okay.

You can establish a connection, let it sit for a while, and occasionally pass data back and forth.

TCP is stateful in contrast to IP (or its very light encapsulation, UDP), which is individual packets that live or die by themselves, with no consciousness of packets that came before, or those that will follow.

But wait, isn't it TCP/IP? TCP on top of IP?

Why yes, it is. TCP is fundamentally "IP with cookies", allowing it to maintain session state, tying many stateless packets together into a nice, clean stateful correspondence. This differs little from HTTP with cookies, a fundamentally stateful protocol when coupled in virtually any post-1996 implementation, where the idea of sessions and statefulness are the norm.

The Web Isn't Stateless!

So why does everyone keep yabbering nonsense about HTTP being stateless (pedantically true, but practically irrelevant and entirely misleading)? Why do so many people talk about the web being stateless in the face of endless contradictory evidence?

I think it's just a cop out: People want to validate their crappy web apps - possibly due to laziness or a desire to migrate back to fat apps - so they clutch onto the justification that it's a fundamental limitation of the platform that limits their abilities, constrains their design or forces them into hackish implementations.

In reality, the web that we've been developing against for the past 10 years has allowed tremendous statefulness, including building up and maintaining enormous quantities of server-side state for every session (just like a fat app or a DCOM component): Just because that isn't appropriate for a very high volume, low value-per-transaction anonymous user website should in no way guide you in your implementation of a low user count, very high value-per-transaction vertical market web app.

You have the ability, and the mandate, to do what's right for the problem, and no one solution or dogma fits all web needs.

Reader Comments

You made an error of equating "stateless" with "no constant connection", and went down from there.

Stateless doesn't mean that there is no constant connection, it means that <em>client</em> has no idea of any change of the <em>server's</em> state until it makes another request. If HTTP was stateful, it would allow for clients to know for such changes of state as soon as they happen, which implies two-way communication. Of course, this is possible with constant connection, but is distinct from it.

Web protocols are stateless, and that's what makes them so robust and useful. Any mechanism to provide stateful communication between clients and servers is designed <em>in spite</em> of it.

Your example with phone line is completely off the mark. A much better example of stateful communication would be that you have an open line with a support center (OK, this is an impossible scenario, but who cares). If they have on site only a network support expert, you won't get much help on databases; but the instance a database person gets on board you know that and you may ask questions on that field.
Berislav Lopac @ 8/1/2006 5:04:35 AM
Sorry Berislav, but I completely disagree with pretty much all of what you've said. :-) (makes it easier as it saves quoting blocks anyways)

In the web world, what people mean by stateless isn't a lack of a reliable and constant backchannel (in fact I can't recall anyone ever bringing this up in the context of the stateless argument, though of course some implement "events" by implementing frequent background polling, especially now with "AJAX")

Pendantically and minimalistly HTTP is stateless - exactly as I said - but realistically it most certainly isn't: That often changing (set of) cookie(s) means that one request occurs with a changing context from the one before, and the ones that follow, pretty much flying in the face of the original stateless design of HTTP. In the web application world, for well over a decade the common model is for the web application to create a "session", with state, on the first request from a given user, making the medium effectively and practically stateful, rendering all of the off-the-mark "stateless" comments as a relic of the first half of the 90s.

And per my 411 example, search the web a bit for the meaning of stateful and stateless -- it correlates exactly with my definition, and my use in examples.
Dennis Forbes @ 8/1/2006 6:10:18 AM
Good article, and good thoughts from a certain point of view, but I think that there are other definitions of 'stateless' that are more relevant here.
I am an enterprise developer, and stateless (in our world, at least) refers to idempotency. In simple terms, an operation is stateless if it can be repeated (ad nauseum) with no change to the output.
For example, f(x) => f(x') holds true for all time no matter how many times f is called on x.
Note that this is distinct from a sequence of operations. If we have the sequence:

f(x) => f(x')
g(x') => g(x'')

Then it may be true that the sequence must hold in that order for the system to stay in a consistent state. Or not, depending on circumstances, but this is salient to our main discussion.
So, in summary... yes, web protocols are stateful. Also, web applications hold state. However, 'stateless' refers specifically to the idempotency of a (networked) system.
The classic example of this is the shopping cart example. The stateless (and correct) implementation of a shopping cart is that no matter how many times you click 'Check Out' you only ever get what was in your original cart (and nothing from anyone else's cart). A stateful (and incorrect) implementation of the shopping cart would continually add duplicates of your shopping list to the check out every time you clicked Check Out.
In mathematical terms:

Stateless: checkout(x) => checkout(x') forever.

Stateful: checkout(x) => checkout(x') => checkout(x'') => ...

So I think you're right in saying that the Web is stateful, but when people say 'stateless' they mean something (subtly) different.

Cheers!
Ben
Ben @ 8/1/2006 7:23:08 AM
Excellent points, Ben. However I'd really fervently argue against your model of statefulness (which sounds more akin to transactional integrity) being the common definition (in fact it seems that everyone has their own definitions, all derived from the origins that HTTP was "stateless", as originally it was entirely stateless). I've worked on huge, organization-wide enterprise web-apps, large-scale public but no-value-per-transaction sites, and of course countless internal type apps. I've heard the stateful/stateless comments countless times, and I've read them in commentary, and their intended meaning correlates closely with my statements.
Dennis Forbes @ 8/1/2006 8:21:32 AM
Yes, there's loads of state in many web applications, and appropriately so. However, there are big advantages to being very aware of what state your web application relies upon and, when possible, reducing it dramatically.

Applications with inappropriate state lead to several major problems in web usage. For example, they tend to overly constrain the user. Ever use a web application that tells you not to use the "Back" button in your browser, but to instead use some other mechanism to navigate back to a page you've already seen? Generally, this is a poor solution to an overtly stateful application: the server-maintained state knows what page you're on, and if you navigate your client to a different state without properly notifying the server, the client's view of the state and the server's view will diverge: a poorly written server won't be able to cope with the problem.

There are many other downfalls to server-maintained state, such as lower reliability and scalability.

In general, I encourage web developers to build stateless applications whenever possible and to very carefully design any necessary state management in their application with an eye on actual web use cases, where users use back & forward, make bookmarks, and e-mail out links to pages deep inside the application's function.

Thus, while many web applications are not, in fact, truly stateless, approaching web application design from a premise of stateless functionality is generally, I find, the best initial approach.
Tim Dierks @ 8/1/2006 11:24:25 AM

Add Comment

Name *:

Email Address:

(your email address is not displayed)
Website:

Comment *:


Dennis Forbes - Dennis Forbes is a Toronto-based software architect and technology writer