Dennis Forbes on Software and Technology   Subscribe to RSS


About the Author
Dennis Forbes is a Toronto-based software architect. While focused primarily on the .NET and SQL Server worlds, Dennis frequently ventures outside of this comfort zone into game development and image processing. He has been published in several industry magazines, has been quoted in the Wall Street Journal and has been interviewed by NPR.

He is a vice president and lead software architect at an innovative New York City hedge fund back-office services firm.

Dennis has been working on solutions for the financial, telecommunications, and power generation markets for over 13 years.


Recent Entries


The Feed Bag
Jan 11 - Answer: No
Jan 11 - The Git DVCS

 
Sunday, May 06 2007

Many garbage-collected languages feature "immutable" strings -- once a value is assigned, it can't be manipulated without creating a new instance (either explicitly or implicitly), with obvious performance implications.

Let's start with a scenario where you're concatenating the string variable B and C into the resulting string A.

A = B + C;

String Concatenation

String Concatenation

Because of the immutability of strings, this is effectively what you're doing even when you're simply concatenating onto a string, and it's for this reason that string-concatenation is often a ghastly slow affair. Many developers have faced this when building a string through concatenation in a loop, for instance dynamically generating a TABLE and its content, or building an XML document.

Consider the following:

myString = myString + "A";

As visualized above, generally that would cause the runtime to create a new memory allocation large enough to hold the concatenated length, then copy the first value and then the second value to the new buffer.

In a loop, where a string is continually adding new content to the end of itself -- a very common need -- the performance will generally take quadratic time (as it's repeatedly allocating and copying a larger and larger string).

I've encountered this frequently enough in .NET, deciding when the ugliness and heft of a StringBuilder is necessitated versus when to just suck up the terrible concatenation performance, however this came up recently in a web project requiring a large amount of display data to be dynamically constructed on the client end.

JavaScript doesn't have a native string builder (a big oversight), so I needed to evaluate how basic string concatenation compared with some of the hackish string builder similes for disparate cases. I was quickly reminded that while Firefox doesn't have great string concatenation performance, it absolutely annihilates Internet Explorer (incl. 7) doing the same.

Try this basic benchmark in both browsers to see for yourself: On the machine I'm typing this on, Firefox completes 131,072 iterations in 1984ms, while Internet Explorer had only completed 32768 iterations after 11078ms (not only did just 1/4 the iterations take 5 times as long, but the iterations it did complete were the smaller, much easier ones, making the disparity much greater than it appears).

I became curious how Firefox implemented string concatenation so performantly, and from that to extrapolate how the Internet Explorer team screwed it up so royally, and honestly to see if I could easily tease even more performance out of it. Given that we have the source (and the ability to easily change it and build it to test differences), I jumped into js/jsstr.c in the Firefox source to take a look (all code subject to the MPL and the Mozilla Foundation retains all rights).

JSString *
js_ConcatStrings(JSContext *cx, JSString *left, JSString *right)
{
    size_t rn, ln, lrdist, n;
    jschar *rs, *ls, *s;
    JSDependentString *ldep;    /* non-null if left should become dependent */
    JSString *str;
    if (JSSTRING_IS_DEPENDENT(right)) {
        rn = JSSTRDEP_LENGTH(right);
        rs = JSSTRDEP_CHARS(right);
    } else {
        rn = right->length;
        rs = right->chars;
    }
    if (rn == 0)
        return left;
    if (JSSTRING_IS_DEPENDENT(left) ||
        !(*js_GetGCThingFlags(left) & GCF_MUTABLE)) {
        /* We must copy if left does not own a buffer to realloc. */
        ln = JSSTRING_LENGTH(left);
        if (ln == 0)
            return right;
        ls = JSSTRING_CHARS(left);
        s = (jschar *) JS_malloc(cx, (ln + rn + 1) * sizeof(jschar));
        if (!s)
            return NULL;
        js_strncpy(s, ls, ln);
        ldep = NULL;
    } else {
        /* We can realloc left's space and make it depend on our result. */
        ln = left->length;
        if (ln == 0)
            return right;
        ls = left->chars;
        s = (jschar *) JS_realloc(cx, ls, (ln + rn + 1) * sizeof(jschar));
        if (!s)
            return NULL;
        /* Take care: right could depend on left! */
        lrdist = (size_t)(rs - ls);
        if (lrdist < ln)
            rs = s + lrdist;
        left->chars = ls = s;
        ldep = JSSTRDEP(left);
    }
    js_strncpy(s + ln, rs, rn);
    n = ln + rn;
    s[n] = 0;
    str = js_NewString(cx, s, n, GCF_MUTABLE);
    if (!str) {
        /* Out of memory: clean up any space we (re-)allocated. */
        if (!ldep) {
            JS_free(cx, s);
        } else {
            s = JS_realloc(cx, ls, (ln + 1) * sizeof(jschar));
            if (s)
                left->chars = s;
        }
    } else {
        /* Morph left into a dependent prefix if we realloc'd its buffer. */
        if (ldep) {
            JSPREFIX_SET_LENGTH(ldep, ln);
            JSPREFIX_SET_BASE(ldep, str);
#ifdef DEBUG
          {
            JSRuntime *rt = cx->runtime;
            JS_RUNTIME_METER(rt, liveDependentStrings);
            JS_RUNTIME_METER(rt, totalDependentStrings);
            JS_LOCK_RUNTIME_VOID(rt,
                (rt->strdepLengthSum += (double)ln,
                 rt->strdepLengthSquaredSum += (double)ln * (double)ln));
          }
#endif
        }
    }
    return str;
}

The key to Mozilla's performance is this line, and the surrounding concept of "dependant" strings:

s = (jschar *) JS_realloc(cx, ls, (ln + rn + 1) * sizeof(jschar));

In a nutshell, when you perform the operation A = B + C, where B and C are strings, it reallocates the memory assigned to B -- depending upon the memory manager and fragmentation, this can often be done in place with no memory copy -- setting B to indicate that it is now a dependant string (e.g. "don't free the associated memory when you garbage collect B"), then copying C to the tail.

String Concatenation



String Concatenation

A and B are now pointing to the same location in memory. In a loop where a string is adding to itself, it is just constantly realloc'ing -- copying to a new memory location when the memory manager can't expand the requested amount in place, otherwise just doing a land grab of memory trailing the string -- and copying the new data on the end.

Changing this bit of code to do a straight new malloc and copy on all concatenations yielded an executable with Internet Explorer-like glacially slow string concats, exactly as expected.

Which made me curious: What if we predictively accommodated the often growing nature of strings, allocating a bit of extra space at the outset, making it more likely that realloc's could be accommodated in place? In the grand scheme of things, the memory taken by JavaScript strings is marginally small, so a little extra padding on result-of-concatenation strings shouldn't hurt.

I implemented the following function in js/jsstr.c.

size_t js_GetNextSize(size_t desired_size)
{
  int max = 0, i;
  
  for (i = 1; i != JSSTRFLAG_DEPENDENT; i <<= 1)
  {
    if (desired_size & i)
    {
      max = i;
    }
  }
  if (max)
  {
    return max << 1;
  } else
  {
    return desired_size;
  }
}

That's nothing more than a trivial implementation of ceil(log2(n)), however I tried to implement in the most transparent, cross-platform way possible.

Now I changed the realloc line referenced above to read:

s = (jschar *) JS_realloc(cx, ls, js_GetNextSize((ln + rn + 1) * sizeof(jschar)));

All reallocs will be to the next power of 2 of the requested size -- if a realloc demands 37 bytes, it'll allocate 64 bytes, and if it demands 7 bytes then it'll allocate 8 -- trading a relatively small amount of memory usage for processing time and memory bandwidth.

Re-running the benchmark linked earlier now shows my custom build of Firefox completing 131,072 iterations in 60 to 90ms (versus 1984ms before, so it completed it in less than 1/20th the time of the already speedy Firefox. Internet Explorer would take minutes to perform that many iterations). Firefox memory usage has not changed to a measurable degree, even during heavy browsing.

This change -- allocating a little extra space to accommodate realloc calls -- is likely the technique used in Opera 9, as this turbo version of Firefox was neck and neck with the speedy Opera 9 on this bit of functionality.

Now to delve through other parts of the excellent JavaScript code...

Reader Comments

I hope you checked your custom code back in?
Berislav Lopac @ 5/6/2007 12:35:23 PM
Hi, nice post. Couple of things to point out:

1. Rounding up to the next power of two can indeed waste a lot of memory. You won't see it with normal Firefox usage, and it will help these synthetic benchmarks (reducing the dominant cost from quadratic to linear complexity). But worst case it could tie up twice the string size (see below), and for a very large string, that could be the rest of available VM.

2. For built-in ceiling-log2, see

http://lxr.mozilla.org/mozilla/source/js/src/jslog2.c#46

Your implementation always doubles max, so if the string size should happen to be a power of two, you'll overallocate. It wasn't clear this was intended, but it's probably better to wait till one more js_ConcatStrings and then grab the extra space.

If you would like to create a bug, please go to https://bugzilla.mozilla.org/ and enter a new bug in the Core product, the JavaScript Engine component. We can continue from there. Thanks,

/be
Brendan Eich @ 5/6/2007 3:39:04 PM
I think you're supposed to do string concatenation as an Array join in JavaScript for best performance. Have you benchmarked that?
Jeff Atwood @ 5/6/2007 4:28:18 PM
The array join is the StringBuilder simile that I mentioned in the post -- it's a sort of hack.

The most interesting thing is that the join is slower on Firefox than string concatenation, and it's *much* slower on my modified version of Firefox as well as Opera. It's faster on Internet Explorer, however, due to the terrible concatenation performance in that browser.
Dennis Forbes @ 5/8/2007 1:36:06 PM
You can amortize the cost of reallocation by doubling the currently allocated space whenever it is insufficient.
@ 5/15/2007 3:07:10 PM

Add Comment

Name *:

Email Address:

(your email address is not displayed)
Website:

Comment *:


Dennis Forbes