Dennis Forbes on Pragmatic Software Development
Subscribe to RSS
 
Monday, September 17 2007

One of my PCs is a bit of a Frankenstein, having gone through countless small upgrades over the years.

A video card here. Some memory modules there. A replacement primary harddrive here (thank you g4u). A supplementary hard drive there. Half a dozen different CD and then DVD and then Dual-Layer DVD burners.

Every now and then it'd see a larger upgrade that mandated a motherboard replacement alongside a new CPU. Often that would require new memory modules as well. Maybe even a new power supply as connection standards changed.

Motherboard replacements have always been the most disruptive, and it's been interesting to watch as each has negated the need for some add-in or other. First the USB+firewire board got punted, having been replaced by onboard functionality. Then the network card. Then the Soundblaster card. The only true add-in card usually needed nowadays is the video card, and I'm sure it's only a matter of time before the on-board video reaches a credible level of performance, eliminating even that.

I've pursued this piecemeal approach to upgrading primarily because it minimized the software disruption in my life, usually requiring just a quick module swap, some driver updates, and it's up and running again. I actually enjoy the modular, hybrid-PC pursuit, individually scoping out and replacing components with the best bang-per-dollar option available at the time. It's a bit of a hobby.

[Clearly I'm not alone: A local "Tiger Direct" store opened recently in my town, featuring a huge floorspace stocked with esoteric power supplies, mod cases, and other components for DIY builders. I'm surprized that the demand is still there, having thought that the self-builder was an endangered species]

I've been negligent, however. Over the past while this PC had seen little attention. Running on an extremely dated Athlon XP 1800+ (overclocked to equal a 2200+), with a "measly" 1GB of DDR1 RAM and a dated collection of complimentary components, it had fallen so far behind the times that it has dropped far off the current CPU charts. While it served its casual gaming task well (the video card is quite contemporary, and given that few games are constrained by the CPU, it held its own), and admirably provided the network storage for photos and videos, its anemic standings were a bit embarrassing. Sure, it didn't need to be decent given the various home and business laptops -- powerful, modern units that saw most of my computing activity -- but I felt like I was letting it down.

So following up the entry from a couple of weeks ago, I finally got around to ordering a new CPU and motherboard on Tuesday, ordering a retail boxed Intel Core 2 Quad Q6600 2.4Ghz processor from Direct Canada for the extraordinarily low price of $279.99 CAD. I'd been directed to their site from a search-engine yielded link to "Shopbot.ca", so I was a bit wary placing my order with this unfamiliar provider, but at 1pm the next day the box arrived at my door, amazingly delivered less than 24 hours after I ordered, coming from a shop 3000km away. I'm very satisfied with the price and speed. (I received no considerations for that comment, and know nothing about the shop beyond the fact that they sold me a killer piece of hardware at a great price, delivering it very quickly. Your mileage may vary.)

In the end I discovered that some new memory modules would be in order to fully yield the speed (going with 2GB to correlate with the oft claimed speed advantage that often flies in complete contradiction to actual memory usage metering). Oh, and a new case as it might make the whole process a little easier.

In the end, the only legacy pieces that made the migration to the "upgraded" box are the hard drives, and the video card.

Minutes later the full-retail copy of Windows was running the right drivers, and after a quick re-activation it was storming along.

I booted up.

In a word (and a punctuation) - Wow!

What a tremendous amount of computational power on the cheap. Day to day activity really feels no different than it did before -- browsing is the same fast browsing that it was before, and given that I don't try to use Excel as a warehousing database, Office seems the same as well. Battlefield 2 plays the same given that I have the same video card, albeit now with absolutely zero stutters or hiccups as other threads demanding timeslices are generally satisfied by one of the other cores.

For the things that actually keep me waiting -- encoding a home video from the MiniDV, or building firefox from CVS, as I do regularly -- the improvement is enormous. Not only are these operations massively sped up by the four cores available to them, better still I can configure them to only use one, two, or three threads of parallel executions (via the -j build option for Firefox, for instance), constraining them as a coarse fix for the deficiencies of the Windows scheduler. I can now run a full Firefox 3 build in just 12 minutes with full parallelism, or run it (or other demanding applications) with little or no impact in the usability and functionality of this PC for other tasks.

Parallel Building Firefox
Full Build Times on Quad Core Processor
(Bars represent time. Shorter=better)

 -j1 (default)    24 minutes 12 seconds
 -j2 (1 to 2 threads)    16 minutes 52 seconds
 -j4 (1 to 4 threads)    14 minutes 34 seconds

The build continued to speed up with more possible parallel operations, albeit with a decreased rate of return, with the fastest test build occuring in just over 12 minutes with the highest option tested: -j12. Having more parallel operations than cores can yield benefits when it increases the time utilization of a saturated resource, which in this case was the hard drive. At this point the cores were left twiddling their thumbs waiting for the storage to catch up.

Limiting the build process to two cores via the process CPU affinity had it CPU starved beyond -j2, yielding no benefit via more parallelism.

You can find a stacked graph detailing core processor usage for the above -j4 run (on 4 cores) at http://www.yafla.com/dforbes/images/Firefox_build_j4_4core.png. You can also look at a chart of building Firefox using the -j4 option, but setting the processor affinity to only allow the build access to two cores.

Not only is the build performance fantastic, but better still I can throttle it back to only run at most two parallel operations (-j2), getting a build in a still impressive 17 minutes while leaving two cores completely available for other tasks, like browsing the web with full responsiveness. I can even launch Battlefield 2, and remarkably it plays flawlessly...despite the fact that a full-scale, parallel build is going on in the background.

(Sidenote: Threads can still be left stalled, stranded waiting for a shared resource like the limited memory bandwidth and I/O paths, for instance. In the sample above my build was on a second harddrive -- a configuration that I recommend for all power users -- and clearly the other shared resources didn't impact the game to a perceivable degree)

What a revolution in computer usage. What a discount-priced computational powerhouse.

Reader Comments

Congrats on your new baby :)

By the way, your Firefox compilation benchmark is exactly what I was talking about in my dual-vs-quad post.

1 thread - 1452 sec (baseline)
2 threads - 1012 sec (70% of baseline)
4 threads - 874 sec (60% of baselne)

Adding core #2 gives us a substantial 30% improvement. But adding cores #3 and #4 only gives us an additional and highly incremental 10% improvement.

Based on that, you can predict how insignificant adding cores #5-#8 would be.

That's why parallelization, beyond the low-hanging fruit of core #2, is a very tough nut to crack.
Jeff Atwood @ 9/17/2007 4:22:51 PM
Good article.

What would be interesting (to me anyway), would be to run the j1 and j2 options on a dual core in the same machine.

If the cores were running at the same speed (1.8Ghz or whatever), would the build times be the same for both and would the difference between j1 and j2 be of the same order for both processors?

C
Carl @ 9/27/2007 8:11:36 AM
Thank you for this entry. Based upon your enthusiasm and supporting info, I upgraded to a Q6700 Kentsfield based system and have been overwhelmingly satisfied.
Warren Shaw @ 11/6/2007 2:44:57 PM

Add Comment

Name *:

Email Address:

(your email address is not displayed)
Website:

Comment *:


Dennis Forbes - Dennis Forbes is a Toronto-based software architect and technology writer