Eric Lee Green
Eric's Home Page

Menu:

Home
UP
EDITORIALS

Links
Contact Info
My Resume
My Public Key
EMAIL Eric

BadTux Portal[et]

Why You Should Avoid Threads With A Passion

My first design for BRU-Pro's tape server was a multi-threaded monolithic server. After all, that's the fastest/most standard method of organizing a server, right? That's what all the 'Softies say anyhow, and you know Microsoft is never wrong, correct?

My initial prototype, however, proved that this approach was wrong-headed and evil. Here's my observations:

  1. Resource management sucks. Resource management in multi-threaded applications is a pain in the #$%!@. Using semaphors to protect all global variables in partition is a PITA. Now I hear you say, "well, you shouldn't have global variables!". But for some things, it's the most sensible way of doing things. For example, if I instantiate a cryptrand object (which opens a filehandle, does all sorts of magical things, and otherwise has a very hefty startup time), I want to do it *ONCE* and then grab random numbers from it as needed without having to do it every time I need random numbers.
  2. Killing threads sucks. This is part and parcel of the resource management problem. Leaving a semaphor hanging around on a shared structure will cause the whole program to lock solid the next time someone needs that resource. I direct you to this text from the msdn.microsoft.com web site:
    • TerminateThread: TerminateThread can result in the following problems:
      • If the target thread owns a critical section, the critical section will not be released.
      • If the target thread is executing certain kernel32 calls when it is terminated, the kernel32 state for the thread's process could be inconsistent.
      • If the target thread is manipulating the global state of a shared DLL, the state of the DLL could be destroyed, affecting other users of the DLL.
    What this basically means is that you can't kill a thread. You can ask a thread nicely to kill itself, but you can't otherwise control its execution once it's started running. The Unix process model, on the other hand, has a mechanism to send a signal to a running process and have it terminate after it tidies itself up. If, after a couple of seconds, it still hasn't terminated, you can terminate it with extreme prejudice and *ALL* resources allocated by that process are freed, even the semaphors.
  3. It encourages monolithic programs. Monolithic programs are Evil(*). Componetized programs are the Real Deal.
  4. Threaded programs are hard to debug.. Any thread can stomp on any other thread's memory, you have the problem of hanging locks, etc. With processes, you have no stomping, and locks can be auto-released by the system when the process exits.
  5. It ain't Unix!. Unix is based upon the principle of "many small tools chained together". Unix was the original component-based operating system, up until the day that the X window system came around and ruined it.
Okay, so threads suck. So what IS the best way to organize a Unix program?

Here's a hint: fork runs at the same speed as spawn on Linux.

BRU-Pro was a classic three-box design:

  1. Client: This is the part the user interacts with. We had three different clients -- web client, CLI client, and GUI (GTK+) client. All clients were written in Python, and called programs on the tape server to do the actual work.
  2. Tape server: This consisted of a secure execution server similar to an augmented 'ssh' (we used a Kerberos-style ticket system and would only execute programs within the 'sandbox', and only after stripping all magic shell characters from the command line to be executed), and a bunch of programs that the client called to do the actual work. The programs ranged in size from tiny 50 line applications that did nothing but set a flag in a database record, to several-thousand-line backup and restore programs that forked off a half-dozen processes. Every program executed directly was a Python program, though "C" modules or external "C" programs were called for everything that was CPU intensive or required low-level access that could not be done via Python.
  3. Backup agent: This lived on the machines to be backed up (or restored). This was yet another secure execution server, though much simpler than the tape server's because it did not have to do ticketing (the public key encryption used verified that only the tape server could contact it, and any extensive security checking was done on the tape server side, not on the backup agent side). Again, the tape server did its work by calling programs on the backup agent via 'bprsh' (which worked similar to 'rsh' but was actually secure, unlike 'rsh'). This was solely "C" and /bin/sh because it had to actually fit on a recovery floppy. Thus the emphasis upon simplicity -- "C" is a pain to program large programs in.
So basically, BRU-Pro was a set of programs being run via 'ssh' commands. Now let's look at the common arguments against that:
  1. Critic: Forking all those processes is slow. Fact: On Linux, fork() takes the same amount of time as spawn(). It is exec() that is the time consumer. For the tape server, what we did was pre-load all commonly used modules into the CTSP (Client-Tape Server Protocol) server. When one of those commonly-used modules was executed via the 'bsh' command from the client, the CTSP server forked, but did not exec -- it instead called the main() entry point of the pre-loaded module.
  2. Critic: Response time is more sluggish. Fact: On any three-box architecture, response time is going to be sluggish. If you wish to query the files on a particular machine, for example, first the client must send the request to the tape server, then the tape server must send the request to that particular machine, and then the results get passed back via the same two hops. Some backup systems, such as NDMP, bypass that by allowing clients to talk directly to end user machines. In my opinion that is either a security nightmare or a programming nightmare -- backup agents should be small enough to fit on a recovery disk, they should not have everything and the kitchen sink thrown into them.

    I benchmarked the servelets on a Celeron 300 (not even a 300a), and we managed over 8 rsh operations per second. On a Celeron 300. You aren't going to click your mouse 8 times per second in normal use (and each operation did a big chunk of work, a mouse click was basically one operation).

    The fact of the matter was that any sluggishness in BRU-Pro was either because of a) inefficiencies in the way we did things (which we knew about, but did not have time to fix for release 1.0 of the product, we were more concerned about being right than being fast for 1.0, 1.1 fixed most of those inefficiencies), or b) inherent in the three-box architecture. The organization of the program had nothing to do with it.

  3. Critic: Startup time of those big programs is slow! Fact: Those big programs were pre-loaded. We wasted no time loading them off of disk. Their startup time would have been the same whether they were forked or spawned.
  4. Critic: All this forking increases the memory footprint! *FINALLY* a well warranted claim! This one was true. While all modern Unix variants employ copy-on-write semantics for their fork() call, meaning that memory is shared until such time as it is written to, the fact is that the memory footprint IS larger than in a threaded design. And 256 megabytes of memory now sells for under $100. The point? I'd rather have reliability than small footprint in today's world. The days when I hand-optimized 6502 assembly code to fit into a 4k area of memory are long gone.
So:
  1. Threads are evil.
  2. Secure remote execution servelets are good
  3. A true Unix program is "many small programs chained together".
So what are you waiting for? Burn your threads and fork it!

-- Eric


(*) Evil(tm) is a trademark of Microsoft Corporation and is used without permission.


Note that everything on this page is Copyright 1997-2003 Eric Lee Green and represents my own opinions and nobody else's. Reproduction without permission strictly prohibited.

Created with PHP 4. Last modified Fri, 06 Dec 2002 10:27:39 -0500.